3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the “memory wall”. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of the logic-base (LoB) in HMC. An accurate simulation environment has been developed, along with a full featured software stack. All offloading and dynamic overheads caused by the operating system, cache coherence, and memory management are considered, as well. Benchmarking results demonstrate up to 2X performance improvement in comparison with the host SoC, and around 1.5X against a similar host-side accelerator. Moreover, by scaling down the voltage and frequency of PIM’s processor it is possible to reduce energy by around 70% and 55% in comparison with the host and the accelerator, respectively.
The capacity of embedded memory on LSIs has kept increasing. It is important to reduce the leakage power of embedded memory for low-power LSIs. In fact, the ITRS predicts that the leakage power in embedded memory will account for 40% of all power consumption by 2024 . A spin transfer torque magneto-resistance random access memory (STT-MRAM) is promising for use as non-volatile memory to reduce the leakage power. It is useful because it can function at low voltages and has a lifetime of over 1016 write cycles . In addition, the STT-MRAM technology has a smaller bit cell than an SRAM. Making the STT-MRAM is suitable for use in high-density products [3–7]. The STT-MRAM uses magnetic tunnel junction (MTJ). The MTJ has two states: a parallel state and an anti-parallel state. These states mean that the magnetization direction of the MTJ’s layers are the same or different. The directions pair determines the MTJ’s magneto- resistance value. The states of MTJ can be changed by the current flowing. The MTJ resistance becomes low in the parallel state and high in the anti-parallel state. The MTJ potentially operates at less than 0.4 V . In other hands, it is difficult to design peripheral circuitry for an STT-MRAM array at such a low voltage. In this paper, we propose a counter-based read circuit that functions at 0.4 V, which is tolerant of process variation and temperature fluctuation.
This study presents an energy-efficient ultra-low voltage standard-cell based memory in 28nm FD-SOI. The storage element (standard-cell latch) is replaced with a full- custom designed latch with 50 % less area. Error-free operation is demonstrated down to 450mV @ 9MHz. By utilizing body bias (BB) @ VDD = 0.5 V performance spans from 20 MHz @ BB=0V to 110MHz @ BB=1V.
The energy efficiency of today’s microcontrollers is supported by the extensive usage of low-power mechanisms. A full power-down requires in many cases a complex, and maybe error prone, administration scheme, because data from the volatile memory have to be stored in a flash based back- up memory. New types of non-volatile memory, e.g. in RRAM technology, are faster and consumes a fraction of the energy compared to flash technology. This paper evaluates power gating for WSN with RRAM as back-up memory.
Three-dimensional (3D) integration using through- silicon via (TSV) has been used for memory designs. Content addressable memory (CAM) is an important component in digital systems. In this paper, we propose an evaluation tool for 3D CAMs, which can aid the designer to explore the delay and power of various partitioning strategies. Delay, power, and energy models of 3D CAM with respect to different architectures are built as well.
Langvorträge: T. Schorr, A. Dittrich, W. Sauer-Greff, R. Urbansky (Lehrstuhl für Nachrichtentechnik, TU Kaiserslautern): Iterative Equalization in Fibre Optical Systems Using High-Rate RCPR, BCH and LDPC Codes A. Doenmez, T. Hehn, J. B. Huber (Lehrstuhl für Informationsübertragung, Universität Erlangen-Nürnberg): Analytical Calculation of Thresholds for LDPC Codes transmitted over Binary Erasure Channels S. Deng, T. Weber (Institut für Nachrichtentechnik und Informationselektronik, Universität Rostock), M. Meurer (Lehrstuhl für hochfrequente Signalübertragung und -verarbeitung, TU Kaiserslautern): Dynamic Resource Allocation in Future OFDM Based Mobile Radio Systems J. Hahn, M. Meurer, T. Weber (Lehrstuhl für hochfrequente Signalübertragung und -verarbeitung, TU Kaiserslautern): Receiver Oriented FEC Coding (RFC) for Selective Channels C. Stierstorfer, R. Fischer (Lehrstuhl für Informationsübertragung, Universität Erlangen-Nürnberg): Comparison of Code Design Requirements for Single- and Multicarrier Transmission over Frequency-Selective MIMO Channels A. Scherb (Arbeitsbereich Nachrichtentechnik, Universität Bremen): Unbiased Semiblind Channel Estimation for Coded Systems T.-J. Liang, W. Rave, G. Fettweis (Vodafone Stiftungslehrstuhl Mobile Nachrichtensysteme, Technische Universität Dresden): Iterative Joint Channel Estimation and Decoding Using Superimposed Pilots in OFDM-WLAN A. Dittrich, T. Schorr, W. Sauer-Greff, R. Urbansky (Lehrstuhl für Nachrichtentechnik, TU Kaiserslautern): DIORAMA - An Iterative Decoding Real-Time MATLAB Receiver for the Multicarrier-Based Digital Radio DRM Kurzvorträge: S. Plass, A. Dammann (German Aerospace Center (DLR)): Radio Resource Management for MC-CDMA over Correlated Rayleigh Fading Channels S. Heilmann, M. Meurer, S. Abdellaoui, T. Weber (Lehrstuhl für hochfrequente Signalübertragung und -verarbeitung, TU Kaiserslautern): Concepts for Accurate Low-Cost Signature Based Localisation of Mobile Terminals M. Siegrist, A. Dittrich, W. Sauer-Greff, R. Urbansky (Lehrstuhl für Nachrichtentechnik, TU Kaiserslautern): SIMO and MIMO Concepts for Fibre Optical Communications C. Bockelmann (Arbeitsbereich Nachrichtentechnik, Universität Bremen): Sender- und Empfängerstrukturen für codierte MIMO-Übertragung
To continue reducing voltage in scaled technologies, both circuit and architecture-level resiliency techniques are needed to tolerate process-induced defects, variation, and aging in SRAM cells. Many different resiliency schemes have been proposed and evaluated, but most prior results focus on voltage reduction instead of energy reduction. At the circuit level, device cell architectures and assist techniques have been shown to lower Vmin for SRAM, while at the architecture level, redundancy and cache disable techniques have been used to improve resiliency at low voltages. This paper presents a unified study of error tolerance for both circuit and architecture techniques and estimates their area and energy overheads. Optimal techniques are selected by evaluating both the error-correcting abilities at low supplies and the overheads of each technique in a 28nm. The results can be applied to many of the emerging memory technologies.
In most cases in a safety analysis the influences of security problems are omitted or even forgotten. Because more and more systems are accessible from outside the system via maintenance interfaces, this missing security analysis is becoming a problem. This is why we propose an approach on how to extend the safety analysis by security aspects. Such a more comprehensive analysis should lead to systems that react in less catastrophic ways to attacks.
Multiple-channel die-stacked DRAMs have been used for maximizing the performance and minimizing the power of memory access in 2.5D/3D system chips. Stacked DRAM dies can be used as a cache for the processor die in 2.5D/3D system chips. Typically, modern processor system-on-chips (SOCs) have three-level caches, L1, L2, and L3. Could the DRAM cache be used to replace which level of caches? In this paper, we derive an inequality which can aid the designer to check if the designed DRAM cache can provide better performance than the L3 cache. Also, design considerations of DRAM caches for meet the inequality are discussed. We find that a dilemma of the DRAM cache access time and associativity exists for providing better performance than the L3 cache. Organizing multiple channels into a DRAM cache is proposed to cope with the dilemma.
Emerging Memories (EMs) could benefit from Error Correcting Codes (ECCs) able to correct few errors in a few nanoseconds. The low latency is necessary to meet the DRAM- like and/or eXecuted-in-Place requirements of Storage Class Memory devices. The error correction capability would help manufacturers to cope with unknown failure mechanisms and to fulfill the market demand for a rapid increase in density. This paper shows the design of an ECC decoder for a shortened BCH code with 256-data-bit page able to correct three errors in less than 3 ns. The tight latency constraint is met by pre-computing the coefficients of carefully chosen Error Locator Polynomials, by optimizing the operations in the Galois Fields and by resorting to a fully parallel combinatorial implementation of the decoder. The latency and the area occupancy are first estimated by the number of elementary gates to traverse, and by the total number of elementary gates of the decoder. Eventually, the implementation of the solution by Synopsys topographical synthesis methodology in 54nm logic gate length CMOS technology gives a latency lower than 3 ns and a total area less than \(250 \cdot 10^3 \mu m^2\).