3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the “memory wall”. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of the logic-base (LoB) in HMC. An accurate simulation environment has been developed, along with a full featured software stack. All offloading and dynamic overheads caused by the operating system, cache coherence, and memory management are considered, as well. Benchmarking results demonstrate up to 2X performance improvement in comparison with the host SoC, and around 1.5X against a similar host-side accelerator. Moreover, by scaling down the voltage and frequency of PIM’s processor it is possible to reduce energy by around 70% and 55% in comparison with the host and the accelerator, respectively.
The capacity of embedded memory on LSIs has kept increasing. It is important to reduce the leakage power of embedded memory for low-power LSIs. In fact, the ITRS predicts that the leakage power in embedded memory will account for 40% of all power consumption by 2024 . A spin transfer torque magneto-resistance random access memory (STT-MRAM) is promising for use as non-volatile memory to reduce the leakage power. It is useful because it can function at low voltages and has a lifetime of over 1016 write cycles . In addition, the STT-MRAM technology has a smaller bit cell than an SRAM. Making the STT-MRAM is suitable for use in high-density products [3–7]. The STT-MRAM uses magnetic tunnel junction (MTJ). The MTJ has two states: a parallel state and an anti-parallel state. These states mean that the magnetization direction of the MTJ’s layers are the same or different. The directions pair determines the MTJ’s magneto- resistance value. The states of MTJ can be changed by the current flowing. The MTJ resistance becomes low in the parallel state and high in the anti-parallel state. The MTJ potentially operates at less than 0.4 V . In other hands, it is difficult to design peripheral circuitry for an STT-MRAM array at such a low voltage. In this paper, we propose a counter-based read circuit that functions at 0.4 V, which is tolerant of process variation and temperature fluctuation.
Magnetic spin-based memory technologies are a promising solution to overcome the incoming limits of microelectronics. Nevertheless, the long write latency and high write energy of these memory technologies compared to SRAM make it difficult to use these for fast microprocessor memories, such as L1- Caches. However, the recent advent of the Spin Orbit Torque (SOT) technology changed the story: indeed, it potentially offers a writing speed comparable to SRAM with a much better density as SRAM and an infinite endurance, paving the way to a new paradigm in processor architectures, with introduction of non- volatility in all the levels of the memory hierarchy towards full normally-off and instant-on processors. This paper presents a full design flow, from device to system, allowing to evaluate the potential of SOT for microprocessor cache memories and very encouraging simulation results using this framework.
Emerging Memories (EMs) could benefit from Error Correcting Codes (ECCs) able to correct few errors in a few nanoseconds. The low latency is necessary to meet the DRAM- like and/or eXecuted-in-Place requirements of Storage Class Memory devices. The error correction capability would help manufacturers to cope with unknown failure mechanisms and to fulfill the market demand for a rapid increase in density. This paper shows the design of an ECC decoder for a shortened BCH code with 256-data-bit page able to correct three errors in less than 3 ns. The tight latency constraint is met by pre-computing the coefficients of carefully chosen Error Locator Polynomials, by optimizing the operations in the Galois Fields and by resorting to a fully parallel combinatorial implementation of the decoder. The latency and the area occupancy are first estimated by the number of elementary gates to traverse, and by the total number of elementary gates of the decoder. Eventually, the implementation of the solution by Synopsys topographical synthesis methodology in 54nm logic gate length CMOS technology gives a latency lower than 3 ns and a total area less than \(250 \cdot 10^3 \mu m^2\).
To continue reducing voltage in scaled technologies, both circuit and architecture-level resiliency techniques are needed to tolerate process-induced defects, variation, and aging in SRAM cells. Many different resiliency schemes have been proposed and evaluated, but most prior results focus on voltage reduction instead of energy reduction. At the circuit level, device cell architectures and assist techniques have been shown to lower Vmin for SRAM, while at the architecture level, redundancy and cache disable techniques have been used to improve resiliency at low voltages. This paper presents a unified study of error tolerance for both circuit and architecture techniques and estimates their area and energy overheads. Optimal techniques are selected by evaluating both the error-correcting abilities at low supplies and the overheads of each technique in a 28nm. The results can be applied to many of the emerging memory technologies.
In DS-CDMA, spreading sequences are allocated to users to separate different
links namely, the base-station to user in the downlink or the user to base station in the uplink. These sequences are designed for optimum periodic correlation properties. Sequences with good periodic auto-correlation properties help in frame synchronisation at the receiver while sequences with good periodic cross-
correlation property reduce cross-talk among users and hence reduce the interference among them. In addition, they are designed to have reduced implementation complexity so that they are easy to generate. In current systems, spreading sequences are allocated to users irrespective of their channel condition. In this thesis,
the method of allocating spreading sequences based on users’ channel condition
is investigated in order to improve the performance of the downlink. Different
methods of dynamically allocating the sequences are investigated including; optimum allocation through a simulation model, fast sub-optimum allocation through
a mathematical model, and a proof-of-concept model using real-world channel
measurements. Each model is evaluated to validate, improvements in the gain
achieved per link, computational complexity of the allocation scheme, and its impact on the capacity of the network.
In cryptography, secret keys are used to ensure confidentiality of communication between the legitimate nodes of a network. In a wireless ad-hoc network, the
broadcast nature of the channel necessitates robust key management systems for
secure functioning of the network. Physical layer security is a novel method of
profitably utilising the random and reciprocal variations of the wireless channel to
extract secret key. By measuring the characteristics of the wireless channel within
its coherence time, reciprocal variations of the channel can be observed between
a pair of nodes. Using these reciprocal characteristics of
common shared secret key is extracted between a pair of the nodes. The process
of key extraction consists of four steps namely; channel measurement, quantisation, information reconciliation, and privacy amplification. The reciprocal channel
variations are measured and quantised to obtain a preliminary key of vector bits (0; 1). Due to errors in measurement, quantisation, and additive Gaussian noise,
disagreement in the bits of preliminary keys exists. These errors are corrected
by using, error detection and correction methods to obtain a synchronised key at
both the nodes. Further, by the method of secure hashing, the entropy of the key
is enhanced in the privacy amplification stage. The efficiency of the key generation process depends on the method of channel measurement and quantisation.
Instead of quantising the channel measurements directly, if their reciprocity is enhanced and then quantised appropriately, the key generation process can be made efficient and fast. In this thesis, four methods of enhancing reciprocity are presented namely; l1-norm minimisation, Hierarchical clustering, Kalman filtering,
and Polynomial regression. They are appropriately quantised by binary and adaptive quantisation. Then, the entire process of key generation, from measuring the channel profile to obtaining a secure key is validated by using real-world channel measurements. The performance evaluation is done by comparing their performance in terms of bit disagreement rate, key generation rate, test of randomness,
robustness test, and eavesdropper test. An architecture, KeyBunch, for effectively
deploying the physical layer security in mobile and vehicular ad-hoc networks is
also proposed. Finally, as an use-case, KeyBunch is deployed in a secure vehicular communication architecture, to highlight the advantages offered by physical layer security.
In this paper, we show the feasibility of low supply voltage for SRAM (Static Random Access Memory) by adding error correction coding (ECC). In SRAM, the memory matrix needs to be powered for data retentive standby operation, resulting in standby leakage current. Particularly for low duty- cycle systems, the energy consumed due to standby leakage current can become significant. Lowering the supply voltage (VDD) during standby mode to below the specified data retention voltage (DRV) helps decrease the leakage current. At these VDD levels errors start to appear, which we can remedy by adding ECC. We show in this paper that addition of a simple single error correcting (SEC) ECC enables us to decrease the leakage current by 45% and leakage power by 72%. We verify this on a large set of commercially available standard 40nm SRAMs.
This tutorial describes how to accurately measure signal power using the FFT. The different effects that introduce errors during FFT processing are described and it is explained how they can be avoided or compensated.
The recently established technologies in the areas of distributed measurement and intelligent
information processing systems, e.g., Cyber Physical Systems (CPS), Ambient
Intelligence/Ambient Assisted Living systems (AmI/AAL), the Internet of Things
(IoT), and Industry 4.0 have increased the demand for the development of intelligent
integrated multi-sensory systems as to serve rapid growing markets [1, 2]. These increase
the significance of complex measurement systems, that incorporate numerous advanced
methodological implementations including electronics circuit, signal processing,
and multi-sensory information fusion. In particular, in multi-sensory cognition applications,
to design such systems, the skill-required tasks, e.g., method selection, parameterization,
model analysis, and processing chain construction are elaborated with immense
effort, which conventionally are done manually by the expert designer. Moreover, the
strong technological competition imposes even more complicated design problems with
multiple constraints, e.g., cost, speed, power consumption,
exibility, and reliability.
Thus, the conventional human expert based design approach may not be able to cope
with the increasing demand in numbers, complexity, and diversity. To alleviate the issue,
the design automation approach has been the topic for numerous research works [3-14]
and has been commercialized to several products [15-18]. Additionally, the dynamic
adaptation of intelligent multi-sensor systems is the potential solution for developing
dependable and robust systems. Intrinsic evolution approach and self-x properties ,
which include self-monitoring, -calibrating/trimming, and -healing/repairing, are among
the best candidates for the issue. Motivated from the ongoing research trends and based
on the background of our research work [12, 13] among the pioneers in this topic, the
research work of the thesis contributes to the design automation of intelligent integrated
In this research work, the Design Automation for Intelligent COgnitive system with self-
X properties, the DAICOX, architecture is presented with the aim of tackling the design
effort and to providing high quality and robust solutions for multi-sensor intelligent
systems. Therefore, the DAICOX architecture is conceived with the defined goals as
Perform front to back complete processing chain design with automated method
selection and parameterization,
Provide a rich choice of pattern recognition methods to the design method pool,
Associate design information via interactive user interface and visualization along
with intuitive visual programming,
Deliver high quality solutions outperforming conventional approaches by using
Gain the adaptability, reliability and robustness of designed solutions with self-x
Derived from the goals, several scientific methodological developments and implementations,
particularly in the areas of pattern recognition and computational intelligence,
will be pursued as part of the DAICOX architecture in the research work of this thesis.
The method pool is aimed to contain a rich choice of methods and algorithms covering
data acquisition and sensor configuration, signal processing and feature computation,
dimensionality reduction, and classification. These methods will be selected and parameterized
automatically by the DAICOX design optimization to construct a multi-sensory
cognition processing chain. A collection of non-parametric feature quality assessment
functions for the purpose of Dimensionality Reduction (DR) process will be presented.
In addition, to standard DR methods, the variations of feature selection method, in
particular, feature weighting will be proposed. Three different classification categories
shall be incorporated in the method pool. Hierarchical classification approach will be
proposed and developed to serve as a multi-sensor fusion architecture at the decision
level. Beside multi-class classification, one-class classification methods, e.g., One-Class
SVM and NOVCLASS will be presented to extend functionality of the solutions, in particular,
anomaly and novelty detection. DAICOX is conceived to effectively handle the
problem of method selection and parameter setting for a particular application yielding
high performance solutions. The processing chain construction tasks will be carried
out by meta-heuristic optimization methods, e.g., Genetic Algorithms (GA) and Particle
Swarm Optimization (PSO), with multi-objective optimization approach and model
analysis for robust solutions. In addition, to the automated system design mechanisms,
DAICOX will facilitate the design tasks with intuitive visual programming and various
options of visualization. Design database concept of DAICOX is aimed to allow the
reusability and extensibility of the designed solutions gained from previous knowledge.
Thus, the cooperative design of machine and knowledge from the design expert can also
be utilized for obtaining fully enhanced solutions. In particular, the integration of self-x
properties as well as intrinsic optimization into the system is proposed to gain enduring
reliability and robustness. Hence, DAICOX will allow the inclusion of dynamically
reconfigurable hardware instances to the designed solutions in order to realize intrinsic
optimization and self-x properties.
As a result from the research work in this thesis, a comprehensive intelligent multisensor
system design architecture with automated method selection, parameterization,
and model analysis is developed with compliance to open-source multi-platform software.It is integrated with an intuitive design environment, which includes visual programming
concept and design information visualizations. Thus, the design effort is minimized as
investigated in three case studies of different application background, e.g., food analysis
(LoX), driving assistance (DeCaDrive), and magnetic localization. Moreover, DAICOX
achieved better quality of the solutions compared to the manual approach in all cases,
where the classification rate was increased by 5.4%, 0.06%, and 11.4% in the LoX,
DeCaDrive, and magnetic localization case, respectively. The design time was reduced
by 81.87% compared to the conventional approach by using DAICOX in the LoX case
study. At the current state of development, a number of novel contributions of the thesis
are outlined below.
Automated processing chain construction and parameterization for the design of
signal processing and feature computation.
Novel dimensionality reduction methods, e.g., GA and PSO based feature selection
and feature weighting with multi-objective feature quality assessment.
A modification of non-parametric compactness measure for feature space quality
Decision level sensor fusion architecture based on proposed hierarchical classification
approach using, i.e., H-SVM.
A collection of one-class classification methods and a novel variation, i.e.,
Automated design toolboxes supporting front to back design with automated
model selection and information visualization.
In this research work, due to the complexity of the task, neither all of the identified goals
have been comprehensively reached yet nor has the complete architecture definition been
fully implemented. Based on the currently implemented tools and frameworks, ongoing
development of DAICOX is pursuing towards the complete architecture. The potential
future improvements are the extension of method pool with a richer choice of methods
and algorithms, processing chain breeding via graph based evolution approach, incorporation
of intrinsic optimization, and the integration of self-x properties. According to
these features, DAICOX will improve its aptness in designing advanced systems to serve
the increasingly growing technologies of distributed intelligent measurement systems, in
particular, CPS and Industrie 4.0.
The current procedures for achieving industrial process surveillance, waste reduction, and prognosis of critical process states are still insufficient in some parts of the manufacturing industry. Increasing competitive pressure, falling margins, increasing cost, just-in-time production, environmental protection requirements, and guidelines concerning energy savings pose new challenges to manufacturing companies, from the semiconductor to the pharmaceutical industry.
New, more intelligent technologies adapted to the current technical standards provide companies with improved options to tackle these situations. Here, knowledge-based approaches open up pathways that have not yet been exploited to their full extent. The Knowledge-Discovery-Process for knowledge generation describes such a concept. Based on an understanding of the problems arising during production, it derives conclusions from real data, processes these data, transfers them into evaluated models and, by this open-loop approach, reiteratively reflects the results in order to resolve the production problems. Here, the generation of data through control units, their transfer via field bus for storage in database systems, their formatting, and the immediate querying of these data, their analysis and their subsequent presentation with its ensuing benefits play a decisive role.
The aims of this work result from the lack of systematic approaches to the above-mentioned issues, such as process visualization, the generation of recommendations, the prediction of unknown sensor und production states, and statements on energy cost.
Both science and commerce offer mature statistical tools for data preprocessing, analysis and modeling, and for the final reporting step. Since their creation, the insurance business, the world of banking, market analysis, and marketing have been the application fields of these software types; they are now expanding to the production environment.
Appropriate modeling can be achieved via specific machine learning procedures, which have been established in various industrial areas, e.g., in process surveillance by optical control systems. Here, State-of-the-art classification methods are used, with multiple applications comprising sensor technology, process areas, and production site data. Manufacturing companies now intend to establish a more holistic surveillance of process data, such as, e.g., sensor failures or process deviations, to identify dependencies. The causes of quality problems must be recognized and selected in real time from about 500 attributes of a highly complex production machine. Based on these identified causes, recommendations for improvement must then be generated for the operator at the machine, in order to enable timely measures to avoid these quality deviations.
Unfortunately, the ability to meet the required increases in efficiency – with simultaneous consumption and waste minimization – still depends on data that are, for the most part, not available. There is an overrepresentation of positive examples whereas the number of definite negative examples is too low.
The acquired information can be influenced by sensor drift effects and the occurrence of quality degradation may not be adequately recognized. Sensorless diagnostic procedures with dual use of actuators can be of help here.
Moreover, in the course of a process, critical states with sometimes unexplained behavior can occur. Also in these cases, deviations could be reduced by early countermeasures.
The generation of data models using appropriate statistical methods is of advantage here.
Conventional classification methods sometimes reach their limits. Supervised learning methods are mostly used in areas of high information density with sufficient data available for the classes under examination. However, there is a growing trend (e.g., spam filtering) to apply supervised learning methods to underrepresented classes, the datasets of which are, at best, outliers or not at all existent.
The application field of One-Class Classification (OCC) deals with this issue. Standard classification procedures (e.g., k-nearest-neighbor classifier, support vector machines) can be modified in adjustment to such problems. Thereby, a control system is able to classify statements on changing process states or sensor deviations. The above-described knowledge discovery process was employed in a case study from the polymer film industry, at the Mondi Gronau GmbH, taken as an example, and accomplished by a real-data survey at the production site and subsequent data preprocessing, modeling, evaluation, and deployment as a system for the generation of recommendations. To this end, questions regarding the following topics had to be clarified: data sources, datasets and their formatting, transfer pathways, storage media, query sequences, the employed methods of classification, their adjustment to the problems at hand, evaluation of the results, construction of a dynamic cycle, and the final implementation in the production process, along with its surplus value for the company.
Pivotal options for optimization with respect to ecological and economical aspects can be found here. Capacity for improvement is given in the reduction of energy consumption, CO\(_2\) emissions, and waste at all machines. At this one site, savings of several million euros per month can be achieved.
One major difficulty so far has been hardly accessible process data which, distributed on various data sources and unconnected, in some areas led to an increased analysis effort and a lack of holistic real-time quality surveillance. Monitoring of specifications and the thus obtained support for the operator at the installation resulted in a clear disadvantage with regard to cost minimization.
The data of the case study, captured according to their purposes and in coordination with process experts, amounted to 21,900 process datasets from cast film extrusion during 2 years’ time, including sensor data from dosing facilities and 300 site-specific energy datasets from the years 2002–2014.
In the following, the investigation sequence is displayed:
1. In the first step, industrial approaches according to Industrie 4.0 and related to Big Data were investigated. The applied statistical software suites and their functions were compared with a focus on real-time data acquisition from database systems, different data formats, their sensor locations at the machines, and the data processing part. The linkage of datasets from various data sources for, e.g., labeling and downstream exploration according to the knowledge discovery process is of high importance for polymer manufacturing applications.
2. In the second step, the aims were defined according to the industrial requirements, i.e. the critical production problem called “cut-off” as the main selection, and with regard to their investigation with machine learning methods. Therefore, a system architecture corresponding to the polymer industry was developed, containing the following processing steps: data acquisition, monitoring \& recommendation, and self-configuration.
3. The novel sensor datasets, with 160–2,500 real and synthetic attributes, were acquired within 1-min intervals via PLC and field bus from an Oracle database. The 160 features were reduced to 6 dimensions with feature reduction methods. Due to underrepresentation of the critical class, the learning approaches had to be modified and optimized for one-class classification, which achieved 99% accuracy after training, testing and evaluation with real datasets.
4. In the next step, the 6-dimensional dataset was scaled into lower 1-, 2-, or 3-dimensional space with classical and non-classical mapping approaches for downstream visualization. The mapped view was separated into zones of normal and abnormal process conditions by threshold setting.
5. Afterwards, the boundary zone was investigated and an approach for trajectory extraction consisting of condition points in sequence was developed, to optimize the prediction behavior of the model. The extracted trajectories were trained, tested and evaluated by State-of-the-art classification methods, achieving a 99% recognition ratio.
6. In the last step, the best methods and processing parts were converted into a specifically developed domain-specific graphical user interface for real-time visualization of process condition changes. The requirements of such an interface were discussed with the operators with regard to intuitive handling, interactive visualization and recommendations (as e.g., messaging and traffic lights), and implemented.
The software prototype was tested at a laboratory machine. Correct recognition of abnormal process problems was achieved at a 90\% ratio. The software was afterwards transferred to a group of on-line production machines.
As demonstrated, the monthly amount of waste arising at machine M150 could be decreased from 20.96% to 12.44% during the application time. The frequency of occurrence of the specific problem was reduced by 30% related to monthly savings of 50,000 EUR.
In the approach pertaining to the energy prognosis of load profiles, monthly energy data from 2002 to 2014 (about 36 trajectories with three to eight real parameters each) were used as the basis, analyzed and modeled systematically. The prognosis quality increased with approaching target date. Thereby, the site-specific load profile for 2014 could be predicted with an accuracy of 99%.
The achievement of sustained cost reductions of several 100,000 euros, combined with additional savings of EUR 2.8 million, could be demonstrated.
The process improvements achieved while pursuing scientific targets could be successfully and permanently integrated at the case study plant. The increase in methodical and experimental knowledge was reflected by first economical results and could be verified numerically. The expectations of the company were more than fulfilled and further developments based on the new findings were initiated. Among the new finding are the transfer of the scientific findings onto more machines and even the initiation of further studies expanding into the diagnostics area.
Considering the size of the enterprise, future enhanced success should also be possible for other locations. In the course of the grid charge exemption according to EEG, the energy savings at further German locations can amount to 4–11% on a monetary basis and at least 5% based on energy. Up to 10% of materials and cost can be saved with regard to waste reduction related to specific problems. According to projections, material savings of 5–10 t per month and time savings of up to 50 person-hours are achievable. Important synergy effects can be created by the knowledge transfer.