621.3 Elektrotechnik, Elektronik
Refine
Year of publication
Document Type
- Doctoral Thesis (22)
- Conference Proceeding (16)
- Article (14)
- Other (2)
- Report (2)
- Bachelor Thesis (1)
Has Fulltext
- yes (57)
Keywords
- Cache (3)
- DRAM (3)
- Niederspannungsnetz (3)
- SRAM (3)
- Elektromobilität (2)
- Netzzustandsschätzung (2)
- Niederspannung (2)
- PIM (2)
- Self-X (2)
- Smart Grid (2)
Faculty / Organisational entity
The simulation of Dynamic Random Access Memories (DRAMs) on system level requires highly accurate models due to their complex timing and power behavior. However, conventional cycle-accurate DRAM subsystem models often become a bottleneck for the overall simulation speed. A promising alternative are simulators based on Transaction Level Modeling, which can be fast and accurate at the same time. In this paper we present DRAMSys4.0, which is, to the best of our knowledge, the fastest and most extensive open-source cycle-accurate DRAM simulation framework. DRAMSys4.0 includes a novel software architecture that enables a fast adaption to different hardware controller implementations and new JEDEC standards. In addition, it already supports the latest standards DDR5 and LPDDR5. We explain how to apply optimization techniques for an increased simulation speed while maintaining full temporal accuracy. Furthermore, we demonstrate the simulator’s accuracy and analysis tools with two application examples. Finally, we provide a detailed investigation and comparison of the most prominent cycle-accurate open-source DRAM simulators with regard to their supported features, analysis capabilities and simulation speed.
In recent years, ◂...▸optical character recognition (OCR) systems have been used to digitally preserve historical archives. To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an OCR is applied. In order to digitize documents without the need to remove them from where they are archived, it is valuable to have a portable device that combines scanning and OCR capabilities. Nowadays, there exist many commercial and open-source document digitization techniques, which are optimized for contemporary documents. However, they fail to give sufficient text recognition accuracy for transcribing historical documents due to the severe quality degradation of such documents. On the contrary, the anyOCR system, which is designed to mainly digitize historical documents, provides high accuracy. However, this comes at a cost of high computational complexity resulting in long runtime and high power consumption. To tackle these challenges, we propose a low power energy-efficient accelerator with real-time capabilities called iDocChip, which is a configurable hybrid hardware-software programmable ◂...▸System-on-Chip (SoC) based on anyOCR for digitizing historical documents. In this paper, we focus on one of the most crucial processing steps in the anyOCR system: Text and Image Segmentation, which makes use of a multi-resolution morphology-based algorithm. Moreover, an optimized FPGA-based hybrid architecture of this anyOCR step along with its optimized software implementations are presented. We demonstrate our results on multiple embedded and general-purpose platforms with respect to runtime and power consumption. The resulting hardware accelerator outperforms the existing anyOCR by 6.2×, while achieving 207× higher energy-efficiency and maintaining its high accuracy.
Recurrent Neural Networks, in particular One-dimensional and Multidimensional Long Short-Term Memory (1D-LSTM and MD-LSTM) have achieved state-of-the-art classification accuracy in many applications such as machine translation, image caption generation, handwritten text recognition, medical imaging and many more. However, high classification accuracy comes at high compute, storage, and memory bandwidth requirements, which make their deployment challenging, especially for energy-constrained platforms such as portable devices. In comparison to CNNs, not so many investigations exist on efficient hardware implementations for 1D-LSTM especially under energy constraints, and there is no research publication on hardware architecture for MD-LSTM. In this article, we present two novel architectures for LSTM inference: a hardware architecture for MD-LSTM, and a DRAM-based Processing-in-Memory (DRAM-PIM) hardware architecture for 1D-LSTM. We present for the first time a hardware architecture for MD-LSTM, and show a trade-off analysis for accuracy and hardware cost for various precisions. We implement the new architecture as an FPGA-based accelerator that outperforms NVIDIA K80 GPU implementation in terms of runtime by up to 84× and energy efficiency by up to 1238× for a challenging dataset for historical document image binarization from DIBCO 2017 contest, and a well known MNIST dataset for handwritten digits recognition. Our accelerator demonstrates highest accuracy and comparable throughput in comparison to state-of-the-art FPGA-based implementations of multilayer perceptron for MNIST dataset. Furthermore, we present a new DRAM-PIM architecture for 1D-LSTM targeting energy efficient compute platforms such as portable devices. The DRAM-PIM architecture integrates the computation units in a close proximity to the DRAM cells in order to maximize the data parallelism and energy efficiency. The proposed DRAM-PIM design is 16.19 × more energy efficient as compared to FPGA implementation. The total chip area overhead of this design is 18 % compared to a commodity 8 Gb DRAM chip. Our experiments show that the DRAM-PIM implementation delivers a throughput of 1309.16 GOp/s for an optical character recognition application.
Nachfolgend ist ein modularer Multilevel-Umrichter mit einer Mehrzahl von Einzelmodulen beschrieben, bei dem eine erste Gruppe von Modulen hintereinander zu einem geschlossenen Ring verschaltet sind und mindestens zwei Abgriffe jeweils zwischen zwei benachbarten Einzelmodulen des Rings angeordnet sind. An mindestens zwei Abgriffen ist je eine zweite weitere Gruppe von Modulen als von der Ringanordnung abzweigendes und einen Sternstrang bildendes Phasenmodul vorgesehen ist. Diese letztgenannten Gruppen von Modulen bilden an den Enden jeweils Anschlüsse oder Abgriffe. Die Module erlauben durch Schaltelemente ein Verschalten von Energiespeichern benachbarter Einzelmodule, wodurch zwischen zwei benachbarten Phasenanschlüssen eine Spannungsdifferenz bereitstellbar ist, die von einer Steuereinheit entsprechend eines Verlaufs eines mehrphasigen Drehfeldes regelbar ist. Ferner betrifft die vorliegende Erfindung ein Polyphasensystem und ein Verfahren zum effizienten Leistungsaustausch zwischen Modulen.
The development of machine learning algorithms and novel sensing modalities has boosted the exploration of human activity recognition(HAR) in recent years. In this work, we explored field-based sensing solutions and different machine learning models for HAR tasks to address the shortcomings of existing HAR sensing solutions, like the weak robustness of RF-based solution, environment-dependency of the optic-based solution, etc., aiming to supply a competitive and alternative sensing approach for HAR tasks.
Field, in physics, describes a region in which each point will be affected by force. Field sensing is potentially a low-cost, low-power, non-intrusive, privacy-respecting HAR solution that is ideal for long-term, wearable activity recording. By directly/indirectly monitoring the field strength or other field variation caused variables, some unsolved HAR problems could be addressed when other sensing solutions fail. An example is the social distance monitoring problem, where the most widely adopted approach is based on the Bluetooth signal strength measurement. However, the signal is so subtle that any object surrounding the signal emitter will cause signal attenuation. To guarantee the accuracy of social distance monitoring, we developed an induced magnetic field-based social distance monitoring system with an accuracy of a sub-ten centimetre. Moreover, the system is robust and resistant to environmental variations. Like Bluetooth, other RF-wave-based sensing modalities also face the multi-path effect caused by refraction. Thus their signal is unreliable for positioning applications where higher accuracy and robustness are needed. Besides the magnetic field, we also explored a natural static passive electric field, the field between the human body and surroundings, namely the human body capacitance(HBC). HBC is a physiological parameter describing the charge distribution difference between the body and the surroundings and is seldomly explored before. We developed a few wearable, low-cost, low power consumption hardware platforms, either based on an oscillating unit or discrete components composed sensing front end followed by a high resolution analog-to-digital module, to
monitor the variation of the parameter regarding the body movement and environmental variations. Compared with the inertial sensors, the HBC could deliver full-body movement perceiving, meaning that the movement of the legs could be perceived by a wrist-worn HBC sensing unit, which is far beyond the
sensing ability of an inertial sensing unit.
To summarize, we introduced two competitive field sensing modalities for HAR tasks, the magnetic field sensing for position-related services and the passive electric field sensing for full-body action and environmental variation sensing. Both of which were still in an infant stage and not fully explored in the community. The advantages of the two field sensing modalities were demonstrated with a series of position-related and motion-related experiments.
Due to the steadily increasing number of decentralized generation units, the upcoming smart meter rollout and the expected electrification of the transport sector (e-mobility), grid planning and grid operation at low-voltage (LV) level are facing major challenges. Therefore, many studies, research and demonstration projects on the above topics have been carried out in recent years, and the results and the methods developed have been published. However, the published methods usually cannot be replicated or validated, since the majority of the examination models or the scenarios used are incomprehensible to third parties. There is a lack of uniform grid models that map the German LV grids and can be used for comparative investigations, which are similar to the example of the North American distribution grid models of the IEEE. In contrast to the transmission grid, whose structure is known with high accuracy, suitable grid models for LV grids are difficult to map because of the high number of LV grids and distribution system operators. Furthermore, a detailed description of real LV grids is usually not available in scientific publications for data privacy
reasons. For investigations within a research project, the most characteristic synthetic LV grid models have been created, which are based on common settlement structures and usual grid planning principles in Germany. In this work, these LV grid models, and their development are explained in detail. For the first time, comprehensible LV grid models for the middle European area are available to the public, which can be used as a benchmark for further scientific research and method developments.
This document is an English version of the paper which was originally written in German1. In addition, this paper discusses a few more aspects especially on the planning process of distribution grids in Germany.
Regelkonzept für eine Niederspannungsnetzautomatisierung unter Verwendung des Merit-Order-Prinzips
(2022)
Durch die zunehmende Erzeugungsleistung auf Niederspannungsnetzebene (NS-Netzebene) durch Photovoltaikanlagen, sowie die Elektrifizierung des Wärme- und des Verkehrssektors sind Investitionen in die NS-Netze notwendig. Ein höherer Digitalisierungsgrad im NS-Netz birgt das Potential, die notwendigen Investitionen genauer zu identifizieren, und damit ggf. zu reduzieren oder zeitlich zu verschieben. Hierbei stellt die Markteinführung intelligenter Messsysteme, sog. Smart Meter, eine neue Möglichkeit dar, Messwerte aus dem NS-Netz zu erhalten und auf deren Grundlage die Stellgrößen verfügbarer Aktoren zu optimieren. Dazu stellt sich die Frage, wie Messdaten unterschiedlicher Messzyklen in einem Netzautomatisierungssystem genutzt werden können und wie sich das nicht-lineare ganzzahlige Optimierungsproblem der Stellgrößenoptimierung effizient lösen lässt. Diese Arbeit befasst sich mit der Lösung des Optimierungsproblems. Dazu kommt eine Stellgrößenoptimierung nach dem Merit-Order-Prinzip zur Anwendung.
Beamforming performs spatial filtering to preserve the signal from given directions of interest while suppressing interfering signals and noise arriving from other directions.
For example, a microphone array equipped with beamforming algorithm could preserve the sound coming from a target speaker and suppress sounds coming from other speakers.
Beamformer has been widely used in many applications such as radar, sonar, communication, and acoustic systems.
A data-independent beamformer is the beamformer whose coefficients are independent on sensor signals, it normally uses less computation since the coefficients are computed once. Moreover, its coefficients are derived from the well-defined statistical models, then it produces less artifacts. The major drawback of this beamforming class is its limitation to the interference suppression.
On the other hand, an adaptive beamformer is a beamformer whose coefficients depend on or adapt to sensor signals. It is capable of suppressing the interference better than a data-independent beamforming but it suffers from either too much distortion of the signal of interest or less noise reduction when the updating rate of coefficients does not synchronize with the changing rate of the noise model. Besides, it is computationally intensive since the coefficients need to be updated frequently.
In acoustic applications, the bandwidth of signals of interest extends over several octaves, but we always expect that the characteristic of the beamformer is invariant with regard to the bandwidth of interest. This can be achieved by the so-called broadband beamforming.
Since the beam pattern of conventional beamformers depends on the frequency of the signal, it is common to use a dense and uniform array for the broadband beamforming to guarantee some essential performances together, such as frequency-independence, less sensitive to white noise, high directivity factor or high front-to-back ratio. In this dissertation, we mainly focus on the sparse array of which the aim is to use fewer sensors in the array,
while simultaneously assuring several important performances of the beamformer.
In the past few decades, many design methodologies for sparse arrays have been proposed and were applied in a variety of practical applications.
Although good results were presented, there are still some restrictions, such as the number of sensors is large, the designed beam pattern must be fixed, the steering ability is limited and the computational complexity is high.
In this work, two novel approaches for the sparse array design taking a hypothesized uniform array as a basis are proposed, that is, one for data-independent beamformers and the another for adaptive beamformers.
As an underlying component of the proposed methods, the dissertation introduces some new insights into the uniform array with broadband beamforming. In this context, a function formulating the relations between the sensor coefficients and its beam pattern over frequency is proposed. The function mainly contains the coordinate transform and inverse Fourier transform.
Furthermore, from the bijection of the function and broadband beamforming perspective, we propose the lower and upper bounds for the inter-distance of sensors. Within these bounds, the function is a bijective function that can be utilized to design the uniform array with broadband beamforming.
For data-independent beamforming, many studies have focused on optimization procedures to seek the sparse array deployment. This dissertation presents an alternative approach to determine the location of sensors.
Starting with a weight spectrum of a virtual dense and uniform array, some techniques are used, such as analyzing a weight spectrum to determine the critical sensors, applying the clustering technique to group the sensors into different groups and selecting representative sensors for each group.
After the sparse array deployment is specified, the optimization technique is applied to find the beamformer coefficients. The proposed method helps to save the computation time in the design phase and its beamformer performance outperforms other state-of-the-art methods in several aspects such as the higher white noise gain, higher directivity factor or more frequency-independence.
For adaptive beamforming, the dissertation attempts to design a versatile sparse microphone array that can be used for different beam patterns.
Furthermore, we aim to reduce the number of microphones in the sparse array while ensuring that its performance can continue to compete with a highly dense and uniform array in terms of broadband beamforming.
An irregular microphone array in a planar surface with the maximum number of distinct distances between the microphones is proposed.
It is demonstrated that the irregular microphone array is well-suited to sparse recovery algorithms that are used to solve underdetermined systems with subject to sparse solutions. Here, a sparse solution is the sound source's spatial spectrum that need to be reconstructed from microphone signals.
From the reconstructed sound sources, a method for array interpolation is presented to obtain an interpolated dense and uniform microphone array that performs well with broadband beamforming.
In addition, two alternative approaches for generalized sidelobe canceler (GSC) beamformer are proposed. One is the data-independent beamforming variant, the other is the adaptive beamforming variant. The GSC decomposes beamforming into two paths: The upper path is to preserve the desired signal, the lower path is to suppress the desired signal. From a beam pattern viewpoint, we propose an improvement for GSC, that is, instead of using the blocking matrix in the lower path to suppress the desired signal, we design a beamformer that contains the nulls at the look direction and at some other directions. Both approaches are simple beamforming design methods and they can be applied to either sparse array or uniform array.
Lastly, a new technique for direction-of-arrival (DOA) estimation based on the annihilating filter is also presented in this dissertation.
It is based on the idea of finite rate of innovation to reconstruct the stream of Diracs, that is, identifying an annihilating filter/locator filter for a few uniform samples and the position of the Diracs are then related to the roots of the filter. Here, an annihilating filter is the filter that suppresses the signal, since its coefficient vector is always orthogonal to every frame of signal.
In the DOA context, we regard an active source as a Dirac associated with the arrival direction, then the directions of active sources can be derived from the roots of the annihilating filter. However,
the DOA obtained by this method is sensitive to noise and the number of DOAs is limited.
To address these issues, the dissertation proposes a robust method to design the annihilating filter and to increase the degree-of-freedom of the measurement system (more active sources can be detected) via observing multiple data frames.
Furthermore, we also analyze the performance of DOA with diffuse noise and propose an extended multiple signal classification algorithm that takes diffuse noise into account. In the simulation,
it shows, that in the case of diffuse noise, only the extended multiple signal classification algorithm can estimate the DOAs properly.
In this thesis, the software development principles of Model-Driven Architecture have been adopted for developing a generation flow for properties. The taken approach for property generation introduces three models, namely the Model-of-Things, the Model-of-Property, and the Model-of-View. Each model belongs to a distinct model layer in the generation flow and each model layer addresses a specific concern of the property generation. The separation of concerns through model layers ensures modular flow development, and enables uncomplicated enhancements and feature extensions. The properties are generated through a series of model-to-model transformations between these model layers. Python is used as the domain-specific language for describing the intermediate transformations. A metamodel-based automation framework is utilized to generate an infrastructure that facilitates the description of transformations. The APIs that form the central part of the infrastructure are generated from the metamodel definitions of the models mentioned before. The generated APIs are further extended with domain-specific APIs to significantly reduce the effort required for developing the transformations. The property generation solution developed in this thesis is termed as “MetaProp”.
A key aspect of the property generation flow is the translation of informal specifications to formal specification models. Due to the diverse nature of hardware designs, the methodology includes different modeling paradigms to formalize the specifications. The metamodel Meta-Expression provides features to describe the behavior of combinational designs in the form of expression trees and dataflow expressions. The MetaExpression metamodel is modular in nature and can be integrated into other metamodel definitions that capture the specification level configurations of the design. For modeling the behavior of sequential designs, a formalism using finite state machine-like notations for traces is introduced. The metamodel MetaSTS defines this formalism. The MetaSTS metamodel enables to define the behavior of sequential designs with annotated timing information for transitions between important states. Annotation is also used to map abstract states in the Model-of-Things to the Model-of-Property and, finally, to the design implementation. Such an annotation or binding mechanism enables Model-of-Properties to be applicable on a variety of design implementations.
Another important contribution of this thesis is a complete processor verification methodology, which is based on the aforementioned generation approach. The introduced methods for specification modeling are employed to formalize the ISA and the behavior of instructions within the processor pipelines. However, it requires substantial manual efforts and in-depth knowledge of the microarchitectural details of the processor implementation to describe the transformations that define the Model-of-Properties. The prime reason for this requirement is the overlapped execution of instructions within the pipelined architectures of processors and the numerous internal and external pipeline stall scenarios. For a complete processor verification, a set of generated properties must consider all combinations of instruction overlapping coupled with all scenarios of pipeline stalls. In retrospect, the Model-of-Properties —from which the properties are generated — are required to consider all combinations of the aforementioned scenarios. To address these aspects, the C-S²QED method — an extension of the S²QED method — has been developed to completely verify a processor. The C-S²QED method is also applicable to exceptions within the processor pipelines and superscalar pipeline architectures. The C-S²QED method detects all functional bugs in a processor implementation and requires significantly less manual efforts compared to state-of-the-art processor verification methods. The completeness hypothesis of the C-S²QED method based on the completeness criterion defined by C-IPC and a completeness proof are also part of this thesis. The property generation flow has been leveraged to generate a set of C-S²QED properties to further enhance the effectiveness of the methodology.
The applicability and effectiveness of the introduced modeling paradigms and developed methods have been demonstrated with the formal verification of several industry strength designs. Numerous logic bugs including the bugs that are typically regarded as difficult to find have been detected during the formal verification with generated properties. Most IPs of an SoC called “RiVal” including the RISC-V core and excluding the legacy IPs have been formally verified only with the proposed methods in this thesis. The Rival SoC is used in the powertrain and safety automotive applications. The manufactured chip works “first time right” and no logic bug has been detected during the post-manufacturing tests. Various architectural alternatives of the RISC-V based processor designs are verified with the generated C-S²QED properties. The property generation is built in a configurable manner such that any changes in microarchitecture of the processor — that may be caused by the changes in specifications — are implicitly covered by the generation flow. Thus, additional manual efforts are not required and the functional flaws due to the changes in specifications are neutralized. Furthermore, the proposed methods have also been applied to communication protocol IPs, bus bridges, interrupt controllers and safety-relevant designs.
Small embedded devices are highly specialized platforms that integrate several pe- ripherals alongside the CPU core. Embedded devices extensively rely on Firmware (FW) to control and access the peripherals as well as other important functionality. Customizing embedded computing platforms to specific application domains often necessitates optimizing the firmware and/or the HW/SW interface under tight re- source constraints. Such optimizations frequently alter the communication between the firmware and the peripheral devices, possibly compromising functional correct- ness of the input/output behavior of the embedded system. This poses challenges to the development and verification of such systems. The system must be adapted and verified to each specific device configuration.
This thesis presents a formal approach to formulate these verification tasks at several levels of abstraction, along with corresponding HW/SW co-equivalence checking techniques for verifying correct I/O behavior of peripherals under a modified firmware. The feasibility of the approach is shown on several case studies, including industrial driver software as well as open-source peripherals. In addition, a subtle bug in one of the peripherals and several undocumented preconditions for correct device behavior were detected by the verification method.