Search

DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses (2022)

Steiner, Lukas ; Jung, Matthias ; Prado, Felipe S. ; Bykov, Kirill ; Wehn, Norbert

The simulation of Dynamic Random Access Memories (DRAMs) on system level requires highly accurate models due to their complex timing and power behavior. However, conventional cycle-accurate DRAM subsystem models often become a bottleneck for the overall simulation speed. A promising alternative are simulators based on Transaction Level Modeling, which can be fast and accurate at the same time. In this paper we present DRAMSys4.0, which is, to the best of our knowledge, the fastest and most extensive open-source cycle-accurate DRAM simulation framework. DRAMSys4.0 includes a novel software architecture that enables a fast adaption to different hardware controller implementations and new JEDEC standards. In addition, it already supports the latest standards DDR5 and LPDDR5. We explain how to apply optimization techniques for an increased simulation speed while maintaining full temporal accuracy. Furthermore, we demonstrate the simulator’s accuracy and analysis tools with two application examples. Finally, we provide a detailed investigation and comparison of the most prominent cycle-accurate open-source DRAM simulators with regard to their supported features, analysis capabilities and simulation speed.

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing (2021)

Kina Tekleyohannes, Menbere ; Rybalkin, Vladimir ; Ghaffar, Muhammed Mohsin ; Varela, Javier Alejandro ; Wehn, Norbert ; Dengel, Andreas

In recent years, ◂...▸optical character recognition (OCR) systems have been used to digitally preserve historical archives. To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an OCR is applied. In order to digitize documents without the need to remove them from where they are archived, it is valuable to have a portable device that combines scanning and OCR capabilities. Nowadays, there exist many commercial and open-source document digitization techniques, which are optimized for contemporary documents. However, they fail to give sufficient text recognition accuracy for transcribing historical documents due to the severe quality degradation of such documents. On the contrary, the anyOCR system, which is designed to mainly digitize historical documents, provides high accuracy. However, this comes at a cost of high computational complexity resulting in long runtime and high power consumption. To tackle these challenges, we propose a low power energy-efficient accelerator with real-time capabilities called iDocChip, which is a configurable hybrid hardware-software programmable ◂...▸System-on-Chip (SoC) based on anyOCR for digitizing historical documents. In this paper, we focus on one of the most crucial processing steps in the anyOCR system: Text and Image Segmentation, which makes use of a multi-resolution morphology-based algorithm. Moreover, an optimized FPGA-based hybrid architecture of this anyOCR step along with its optimized software implementations are presented. We demonstrate our results on multiple embedded and general-purpose platforms with respect to runtime and power consumption. The resulting hardware accelerator outperforms the existing anyOCR by 6.2×, while achieving 207× higher energy-efficiency and maintaining its high accuracy.

Efficient Hardware Architectures for 1D- and MD-LSTM Networks (2020)

Rybalkin, Vladimir ; Sudarshan, Chirag ; Weis, Christian ; Lappas, Jan ; Wehn, Norbert ; Cheng, Li

Recurrent Neural Networks, in particular One-dimensional and Multidimensional Long Short-Term Memory (1D-LSTM and MD-LSTM) have achieved state-of-the-art classification accuracy in many applications such as machine translation, image caption generation, handwritten text recognition, medical imaging and many more. However, high classification accuracy comes at high compute, storage, and memory bandwidth requirements, which make their deployment challenging, especially for energy-constrained platforms such as portable devices. In comparison to CNNs, not so many investigations exist on efficient hardware implementations for 1D-LSTM especially under energy constraints, and there is no research publication on hardware architecture for MD-LSTM. In this article, we present two novel architectures for LSTM inference: a hardware architecture for MD-LSTM, and a DRAM-based Processing-in-Memory (DRAM-PIM) hardware architecture for 1D-LSTM. We present for the first time a hardware architecture for MD-LSTM, and show a trade-off analysis for accuracy and hardware cost for various precisions. We implement the new architecture as an FPGA-based accelerator that outperforms NVIDIA K80 GPU implementation in terms of runtime by up to 84× and energy efficiency by up to 1238× for a challenging dataset for historical document image binarization from DIBCO 2017 contest, and a well known MNIST dataset for handwritten digits recognition. Our accelerator demonstrates highest accuracy and comparable throughput in comparison to state-of-the-art FPGA-based implementations of multilayer perceptron for MNIST dataset. Furthermore, we present a new DRAM-PIM architecture for 1D-LSTM targeting energy efficient compute platforms such as portable devices. The DRAM-PIM architecture integrates the computation units in a close proximity to the DRAM cells in order to maximize the data parallelism and energy efficiency. The proposed DRAM-PIM design is 16.19 × more energy efficient as compared to FPGA implementation. The total chip area overhead of this design is 18 % compared to a commodity 8 Gb DRAM chip. Our experiments show that the DRAM-PIM implementation delivers a throughput of 1309.16 GOp/s for an optical character recognition application.

Driving Against the Memory Wall: The Role of Memory for Autonomous Driving (2018)

Jung, Matthias ; Wehn, Norbert

Autonomous driving is disrupting the conventional automotive development. In fact, autonomous driving kicks off the consolidation of control units, i.e. the transition from distributed Electronic Control Units (ECUs) to centralized domain controllers. Platforms like Audi’s zFAS demonstrate this very clearly, where GPUs, Custom SoCs, Microcontrollers, and FPGAs are integrated on a single domain controller in order to perform sensor fusion, processing and decision making on a single Printed Circuit Board (PCB). The communication between these heterogeneous components and the algorithms for Advanced Driving Assistant Systems (ADAS) itself requires a huge amount of memory bandwidth, which will bring the Memory Wall from High Performance Computing (HPC) and data-centers directly in our cars. In this paper we highlight the roles and issues of Dynamic Random Access Memories (DRAMs) for future autonomous driving architectures.

A Platform for Analyzing DDR3 and DDR4 DRAMs (2018)

Jung, Matthias ; Mathew, Deepak M. ; M.Eng., Carl C. Rheinländer ; Weis, Christian ; Wehn, Norbert

The authors explore the intrinsic trade-off in a DRAM between the power consumption (due to refresh) and the reliability. Their unique measurement platform allows tailoring to the design constraints depending on whether power consumption, performance or reliability has the highest design priority. Furthermore, the authors show how this measurement platform can be used for reverse engineering the internal structure of DRAMs and how this knowledge can be used to improve DRAM’s reliability.

Investigate the hardware description language Chisel - A case study implementing the Heston model (2013)

Stumm, Christopher ; Brugger, Christian ; Wehn, Norbert

This paper presents a case study comparing the hardware description language „Constructing Hardware in a Scala Embedded Language“(Chisel) to VHDL. For a thorough comparison the Heston Model was implemented, a stochastic model used in financial mathematics to calculate option prices. Metrics like hardware utilization and maximum clock rate were extracted from both resulting designs and compared to each other. The results showed a 30% reduction in code size compared to VHDL, while the resulting circuits had about the same hardware utilization. Using Chisel however proofed to be difficult because of a few features that were not available for this case study.

Investigate the high-level HDL Chisel (2013)

Heilmann, Florian ; Brugger, Christian ; Wehn, Norbert

Chisel (Constructing Hardware in a Scala embedded language) is a new programming language, which embedded in Scala, used for hardware synthesis. It aims to increase productivity when creating hardware by enabling designers to use features present in higher level programming languages to build complex hardware blocks. In this paper, the most advertised features of Chisel are investigated and compared to their VHDL counterparts, if present. Afterwards, the authors’ opinion if a switch to Chisel is worth considering is presented. Additionally, results from a related case study on Chisel are briefly summarized. The author concludes that, while Chisel has promising features, it is not yet ready for use in the industry.

AmICA - Design and implementation of a flexible, compact, and low-power node platform (2010)

Wille, Sebastian ; Wehn, Norbert ; Martinovic, Ivan ; Kunz, Simon ; Göhner, Peter

Wireless sensor networks are the driving force behind many popular and interdisciplinary research areas, such as environmental monitoring, building automation, healthcare and assisted living applications. Requirements like compactness, high integration of sensors, flexibility, and power efficiency are often very different and cannot be fulfilled by state-of-the-art node platforms at once. In this paper, we present and analyze AmICA: a flexible, compact, easy-to-program, and low-power node platform. Developed from scratch and including a node, a basic communication protocol, and a debugging toolkit, it assists in an user-friendly rapid application development. The general purpose nature of AmICA was evaluated in two practical applications with diametric requirements. Our analysis shows that AmICA nodes are 67% smaller than BTnodes, have five times more sensors than Mica2Dot and consume 72% less energy than the state-of-the-art TelosB mote in sleep mode.

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Keywords

Faculty / Organisational entity

8 search hits