### Refine

#### Year of publication

#### Document Type

- Doctoral Thesis (47)
- Conference Proceeding (16)
- Article (9)
- Preprint (6)
- Report (3)
- Other (2)
- Course Material (1)

#### Language

- English (84) (remove)

#### Keywords

- Mobilfunk (5)
- Cache (3)
- DRAM (3)
- MIMO (3)
- Model checking (3)
- OFDM (3)
- SRAM (3)
- System-on-Chip (3)
- Chisel (2)
- FPGA (2)

#### Faculty / Organisational entity

- Fachbereich Elektrotechnik und Informationstechnik (84) (remove)

Hardware Contention-Aware Real-Time Scheduling on Multi-Core Platforms in Safety-Critical Systems
(2019)

While the computing industry has shifted from single-core to multi-core processors for performance gain, safety-critical systems (SCSs) still require solutions that enable their transition while guaranteeing safety, requiring no source-code modifications and substantially reducing re-development and re-certification costs, especially for legacy applications that are typically substantial. This dissertation considers the problem of worst-case execution time (WCET) analysis under contentions when deadline-constrained tasks in independent partitioned task set execute on a homogeneous multi-core processor with dynamic time-triggered shared memory bandwidth partitioning in SCSs.
Memory bandwidth in multi-core processors is shared across cores and is a significant cause of performance bottleneck and temporal variability of multiple-orders in task’s execution times due to contentions in memory sub-system. Further, the circular dependency is not only between WCET and CPU scheduling of others cores, but also between WCET and memory bandwidth assignments over time to cores. Thus, there is need of solutions that allow tailoring memory bandwidth assignments to workloads over time and computing safe WCET. It is pragmatically infeasible to obtain WCET estimates from static WCET analysis tools for multi-core processors due to the sheer computational complexity involved.
We use synchronized periodic memory servers on all cores that regulate each core’s maximum memory bandwidth based on allocated bandwidth over time. First, we present a workload schedulability test for known even-memory-bandwidth-assignment-to-active-cores over time, where the number of active cores represents the cores with non-zero memory bandwidth assignment. Its computational complexity is similar to merge-sort. Second, we demonstrate using a real avionics certified safety-critical application how our method’s use can preserve an existing application’s single-core CPU schedule under contentions on a multi-core processor. It enables incremental certification using composability and requires no-source code modification.
Next, we provide a general framework to perform WCET analysis under dynamic memory bandwidth partitioning when changes in memory bandwidth to cores assignment are time-triggered and known. It provides a stall maximization algorithm that has a complexity similar to a concave optimization problem and efficiently implements the WCET analysis. Last, we demonstrate dynamic memory assignments and WCET analysis using our method significantly improves schedulability compared to the stateof-the-art using an Integrated Modular Avionics scenario.

In DS-CDMA, spreading sequences are allocated to users to separate different
links namely, the base-station to user in the downlink or the user to base station in the uplink. These sequences are designed for optimum periodic correlation properties. Sequences with good periodic auto-correlation properties help in frame synchronisation at the receiver while sequences with good periodic cross-
correlation property reduce cross-talk among users and hence reduce the interference among them. In addition, they are designed to have reduced implementation complexity so that they are easy to generate. In current systems, spreading sequences are allocated to users irrespective of their channel condition. In this thesis,
the method of allocating spreading sequences based on users’ channel condition
is investigated in order to improve the performance of the downlink. Different
methods of dynamically allocating the sequences are investigated including; optimum allocation through a simulation model, fast sub-optimum allocation through
a mathematical model, and a proof-of-concept model using real-world channel
measurements. Each model is evaluated to validate, improvements in the gain
achieved per link, computational complexity of the allocation scheme, and its impact on the capacity of the network.
In cryptography, secret keys are used to ensure confidentiality of communication between the legitimate nodes of a network. In a wireless ad-hoc network, the
broadcast nature of the channel necessitates robust key management systems for
secure functioning of the network. Physical layer security is a novel method of
profitably utilising the random and reciprocal variations of the wireless channel to
extract secret key. By measuring the characteristics of the wireless channel within
its coherence time, reciprocal variations of the channel can be observed between
a pair of nodes. Using these reciprocal characteristics of
common shared secret key is extracted between a pair of the nodes. The process
of key extraction consists of four steps namely; channel measurement, quantisation, information reconciliation, and privacy amplification. The reciprocal channel
variations are measured and quantised to obtain a preliminary key of vector bits (0; 1). Due to errors in measurement, quantisation, and additive Gaussian noise,
disagreement in the bits of preliminary keys exists. These errors are corrected
by using, error detection and correction methods to obtain a synchronised key at
both the nodes. Further, by the method of secure hashing, the entropy of the key
is enhanced in the privacy amplification stage. The efficiency of the key generation process depends on the method of channel measurement and quantisation.
Instead of quantising the channel measurements directly, if their reciprocity is enhanced and then quantised appropriately, the key generation process can be made efficient and fast. In this thesis, four methods of enhancing reciprocity are presented namely; l1-norm minimisation, Hierarchical clustering, Kalman filtering,
and Polynomial regression. They are appropriately quantised by binary and adaptive quantisation. Then, the entire process of key generation, from measuring the channel profile to obtaining a secure key is validated by using real-world channel measurements. The performance evaluation is done by comparing their performance in terms of bit disagreement rate, key generation rate, test of randomness,
robustness test, and eavesdropper test. An architecture, KeyBunch, for effectively
deploying the physical layer security in mobile and vehicular ad-hoc networks is
also proposed. Finally, as an use-case, KeyBunch is deployed in a secure vehicular communication architecture, to highlight the advantages offered by physical layer security.

This study presents an energy-efficient ultra-low voltage standard-cell based memory in 28nm FD-SOI. The storage element (standard-cell latch) is replaced with a full- custom designed latch with 50 % less area. Error-free operation is demonstrated down to 450mV @ 9MHz. By utilizing body bias (BB) @ VDD = 0.5 V performance spans from 20 MHz @ BB=0V to 110MHz @ BB=1V.

3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the “memory wall”. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of the logic-base (LoB) in HMC. An accurate simulation environment has been developed, along with a full featured software stack. All offloading and dynamic overheads caused by the operating system, cache coherence, and memory management are considered, as well. Benchmarking results demonstrate up to 2X performance improvement in comparison with the host SoC, and around 1.5X against a similar host-side accelerator. Moreover, by scaling down the voltage and frequency of PIM’s processor it is possible to reduce energy by around 70% and 55% in comparison with the host and the accelerator, respectively.

Divide-and-Conquer is a common strategy to manage the complexity of system design and verification. In the context of System-on-Chip (SoC) design verification, an SoC system is decomposed into several modules and every module is separately verified. Usually an SoC module is reactive: it interacts with its environmental modules. This interaction is normally modeled by environment constraints, which are applied to verify the SoC module. Environment constraints are assumed to be always true when verifying the individual modules of a system. Therefore the correctness of environment constraints is very important for module verification.
Environment constraints are also very important for coverage analysis. Coverage analysis in formal verification measures whether or not the property set fully describes the functional behavior of the design under verification (DuV). if a set of properties describes every functional behavior of a DuV, the set of properties is called complete. To verify the correctness of environment constraints, Assume-Guarantee Reasoning rules can be employed.
However, the state of the art assume-guarantee reasoning rules cannot be applied to the environment constraints specified by using an industrial standard property language such as SystemVerilog Assertions (SVA).
This thesis proposes a new assume-guarantee reasoning rule that can be applied to environment constraints specified by using a property language such as SVA. In addition, this thesis proposes two efficient plausibility checks for constraints that can be conducted without a concrete implementation of the considered environment.
Furthermore, this thesis provides a compositional reasoning framework determining that a system is completely verified if all modules are verified with Complete Interval Property Checking (C-IPC) under environment constraints.
At present, there is a trend that more of the functionality in SoCs is shifted from the hardware to the hardware-dependent software (HWDS), which is a crucial component in an SoC, since other software layers, such as the operating systems are built on it. Therefore there is an increasing need to apply formal verification to HWDS, especially for safety-critical systems.
The interactions between HW and HWDS are often reactive, and happen in a temporal order. This requires new property languages to specify the reactive behavior at the HW and SW interfaces.
This thesis introduces a new property language, called Reactive Software Property Language (RSPL), to specify the reactive interactions between the HW and the HWDS.
Furthermore, a method for checking the completeness of software properties, which are specified by using RSPL, is presented in this thesis. This method is motivated by the approach of checking the completeness of hardware properties.

This paper presents a completely systematic design procedure for asynchronous controllers.The initial step is the construction of a signal transition graph (STG, an interpreted Petri net) ofthe dialog between data path and controller: a formal representation without reference to timeor internal states. To implement concurrently operating control structures, and also to reducedesign effort and circuit cost, this STG can be decomposed into overlapping subnets. A univer-sal initial solution is then obtained by algorithmically constructing a primitive flow table fromeach component net. This step links the procedure to classical asynchronous design, in particu-lar to its proven optimization methods, without restricting the set of solutions. In contrast toother approaches, there is no need to extend the original STG intuitively.

Memory accesses are the bottleneck of modern computer systems both in terms of performance and energy. This barrier, known as "the Memory Wall", can be break by utilizing memristors. Memristors are novel passive electrical components with varying resistance based on the charge passing through the device [1]. In this abstract, the term "memristor" covers also an extension of the definition, memristive devices, which vary their resistance depending on a state variable [2]. While memristors are naturally used as memory cells, they can also be used for other applications, such as logic circuits [3].
We present a novel architecture that redefines the relationship between the memory and the processor by enabling data processing within the memory itself. Our architecture is based on a memristive memory array, in which we perform two basic logic operations: Imply (material implication) [4] and False.

Specification of asynchronous circuit behaviour becomes more complex as the
complexity of today’s System-On-a-Chip (SOC) design increases. This also causes
the Signal Transition Graphs (STGs) – interpreted Petri nets for the specification
of asynchronous circuit behaviour – to become bigger and more complex, which
makes it more difficult, sometimes even impossible, to synthesize an asynchronous
circuit from an STG with a tool like petrify [CKK+96] or CASCADE [BEW00].
It has, therefore, been suggested to decompose the STG as a first step; this
leads to a modular implementation [KWVB03] [KVWB05], which can reduce syn-
thesis effort by possibly avoiding state explosion or by allowing the use of library
elements. A decomposition approach for STGs was presented in [VW02] [KKT93]
[Chu87a]. The decomposition algorithm by Vogler and Wollowski [VW02] is based
on that of Chu [Chu87a] but is much more generally applicable than the one in
[KKT93] [Chu87a], and its correctness has been proved formally in [VW02].
This dissertation begins with Petri net background described in chapter 2.
It starts with a class of Petri nets called a place/transition (P/T) nets. Then
STGs, the subclass of P/T nets, is viewed. Background in net decomposition
is presented in chapter 3. It begins with the structural decomposition of P/T
nets for analysis purposes – liveness and boundedness of the net. Then STG
decomposition for synthesis from [VW02] is described.
The decomposition method from [VW02] still could be improved to deal with
STGs from real applications and to give better decomposition results. Some
improvements for [VW02] to improve decomposition result and increase algorithm
efficiency are discussed in chapter 4. These improvement ideas are suggested in
[KVWB04] and some of them are have been proved formally in [VK04].
The decomposition method from [VW02] is based on net reduction to find
an output block component. A large amount of work has to be done to reduce
an initial specification until the final component is found. This reduction is not
always possible, which causes input initially classified as irrelevant to become
relevant input for the component. But under certain conditions (e.g. if structural
auto-conflicts turn out to be non-dynamic) some of them could be reclassified as
irrelevant. If this is not done, the specifications become unnecessarily large, which
intern leads to unnecessarily large implemented circuits. Instead of reduction, a
new approach, presented in chapter 5, decomposes the original net into structural
components first. An initial output block component is found by composing the
structural components. Then, a final output block component is obtained by net
reduction.
As we cope with the structure of a net most of the time, it would be useful
to have a structural abstraction of the net. A structural abstraction algorithm
[Kan03] is presented in chapter 6. It can improve the performance in finding an
output block component in most of the cases [War05] [Taw04]. Also, the structure
net is in most cases smaller than the net itself. This increases the efficiency of the
decomposition algorithm because it allows the transitions contained in a node of
the structure graph to be contracted at the same time if the structure graph is
used as internal representation of the net.
Chapter 7 discusses the application of STG decomposition in asynchronous
circuit design. Application to speed independent circuits is discussed first. Af-
ter that 3D circuits synthesized from extended burst mode (XBM) specifications
are discussed. An algorithm for translating STG specifications to XBM specifi-
cations was first suggested by [BEW99]. This algorithm first derives the state
machine from the STG specification, then translates the state machine to XBM
specification. An XBM specification, though it is a state machine, allows some
concurrency. These concurrencies can be translated directly, without deriving
all of the possible states. An algorithm which directly translates STG to XBM
specifications, is presented in chapter 7.3.1. Finally DESI, a tool to decompose
STGs and its decomposition results are presented.

The dissertation describes a practically proven, particularly efficient approach for the verification of digital circuit designs. The approach outperforms simulation based verification wrt. final circuit quality as well as wrt. required verification effort. In the dissertation, the paradigm of transaction based verification is ported from simulation to formal verification. One consequence is a particular format of formal properties, called operation properties. Circuit descriptions are verified by proof of operation properties with Interval Property Checking (IPC), a particularly strong SAT based formal verification algorithm. Furtheron, a completeness checker is presented that identifies all verification gaps in sets of operation properties. This completeness checker can handle the large operation properties that arise, if this approach is applied to realistic circuits. The methodology of operation properties, Interval Property Checking, and the completeness checker form a symbiosis that is of particular benefit to the verification of digital circuit designs. On top of this symbiosis an approach to completely verify the interaction of completely verified modules has been developed by adaptation of the modelling theories of digital systems. The approach presented in the dissertation has proven in multiple commercial application projects that it indeed completely verifies modules. After reaching a termination criterion that is well defined by completeness checking, no further bugs were found in the verified modules. The approach is marketed by OneSpin Solutions GmbH, Munich, under the names "Operation Based Verification" and "Gap Free Verification".

As the sustained trend towards integrating more and more functionality into systems on a chip can be observed in all fields, their economic realization is a challenge for the chip making industry. This is, however, barely possible today, as the ability to design and verify such complex systems could not keep up with the rapid technological development. Owing to this productivity gap, a design methodology, mainly using pre designed and pre verifying blocks, is mandatory. The availability of such blocks, meeting the highest possible quality standards, is decisive for its success. Cost-effective, this can only be achieved by formal verification on the block-level, namely by checking properties, ranging over finite intervals of time. As this verification approach is based on constructing and solving Boolean equivalence problems, it allows for using backtrack search procedures, such as SAT. Recent improvements of the latter are responsible for its high capacity. Still, the verification of some classes of hardware designs, enjoying regular substructures or complex arithmetic data paths, is difficult and often intractable. For regular designs, this is mainly due to individual treatment of symmetrical parts of the search space by backtrack search procedures used. One approach to tackle these deficiencies, is to exploit the regular structure for problem reduction on the register transfer level (RTL). This work describes a new approach for property checking on the RTL, preserving the problem inherent structure for subsequent reduction. The reduction is based on eliminating symmetrical parts from bitvector functions, and hence, from the search space. Several approaches for symmetry reduction in search problems, based on invariance of a function under permutation of variables, have been previously proposed. Unfortunately, our investigations did not reveal this kind of symmetry in relevant cases. Instead, we propose a reduction based on symmetrical values, as we encounter them much more frequently in our industrial examples. Let \(f\) be a Boolean function. The values \(0\) and \(1\) are symmetrical values for a variable \(x\) in \(f\) iff there is a variable permutation \(\pi\) of the variables of \(f\), fixing \(x\), such that \(f|_{x=0} = \pi(f|_{x=1})\). Then the question whether \(f=1\) holds is independent from this variable, and it can be removed. By iterative application of this approach to all variables of \(f\), they are either all removed, leaving \(f=1\) or \(f=0\) trivially, or there is a variable \(x'\) with no such \(\pi\). The latter leads to the conclusion that \(f=1\) does not hold, as we found a counter-example either with \(x'=0\), or \(x'=1\). Extending this basic idea to vectors of variables, allows to elevate it to the RTL. There, self similarities in the function representation, resulting from the regular structure preserved, can be exploited, and as a consequence, symmetrical bitvector values can be found syntactically. In particular, bitvector term-rewriting techniques, isomorphism procedures for specially manipulated term graphs, and combinations thereof, are proposed. This approach dramatically reduces the computational effort needed for functional verification on the block-level and, in particular, for the important problem class of regular designs. It allows the verification of industrial designs previously intractable. The main contributions of this work are in providing a framework for dealing with bitvector functions algebraically, a concise description of bounded model checking on the register transfer level, as well as new reduction techniques and new approaches for finding and exploiting symmetrical values in bitvector functions.