Doctoral Thesis
Refine
Document Type
- Doctoral Thesis (2) (remove)
Language
- English (2)
Has Fulltext
- yes (2)
Keywords
- Code Generation (2) (remove)
Faculty / Organisational entity
Scaling up conventional processor architectures cannot translate the ever-increasing number of transistors into comparable application performance. Although the trend is to shift from single-core to multi-core architectures, utilizing these multiple cores is not a trivial task for many applications due to thread synchronization and weak memory consistency issues. This is especially true for applications in real-time embedded systems since timing analysis becomes more complicated due to contention on shared resources. One inherent reason for the limited use of instruction-level parallelism (ILP) by conventional processors is the use of registers. Therefore, some recent processors bypass register usage by directly communicating values from producer processing units to consumer processing units. In widely used superscalar processors, this direct instruction communication is organized by hardware at runtime, adversely affecting its scalability. The exposed datapath architectures provide a scalable alternative by allowing compilers to move values directly from output ports to the input ports of processing units. Though exposed datapath architectures have already been studied in great detail, they still use registers for executing programs, thus limiting the amount of ILP they can exploit. This limitation stems from a drawback in their execution paradigm, code generator, or both.
This thesis considers a novel exposed datapath architecture named Synchronous Control Asynchronous Dataflow (SCAD) that follows a hybrid control-flow dataflow execution paradigm. The SCAD architecture employs first-in-first-out (FIFO) buffers at the output and input ports of processing units. It is programmed by move instructions that transport values from the head of output buffers to the tail of input buffers. Thus, direct instruction communication is facilitated by the architecture. The processing unit triggers the execution of an operation when operand values are available at the heads of its input buffers. We propose a code generation technique for SCAD processors inspired by classical queue machines that completely eliminates the use of registers. On this basis, we first generate optimal code by using satisfiability (SAT) solvers after establishing that optimal code generation is hard. Heuristics based on a novel buffer interference analysis are then developed to compile larger programs. The experimental results demonstrate the efficacy of the execution paradigm of SCAD using our queue-oriented code generation technique.
As the complexity of embedded systems continuously rises, their development becomes more and more challenging. One technique to cope with this complexity is the employment of virtual prototypes. The virtual prototypes are intended to represent the embedded system’s properties on different levels of detail like register transfer level or transaction level. Virtual prototypes can be used for different tasks throughout the development process. They can act as executable specification, can be used for architecture exploration, can ease system integration, and allow for pre- and post-silicon software development and verification. The optimization objectives for virtual prototypes and their creation process are manifold. Finding an appropriate trade-off between the simulation accuracy, the simulation performance, and the implementation effort is a major challenge, as these requirements are contradictory.
In this work, two new and complementary techniques for the efficient creation of accurate and high-performance SystemC based virtual prototypes are proposed: Advanced Temporal Decoupling (ATD) and Transparent Transaction Level Modeling (TTLM). The suitability for industrial environments is assured by the employment of common standards like SystemC TLM-2.0 and IP-XACT.
Advanced Temporal Decoupling enhances the simulation accuracy while retaining high simulation performance by allowing for cycle accurate simulation in the context of SystemC TLM-2.0 temporal decoupling. This is achieved by exploiting the local time warp arising in SystemC TLM-2.0 temporal decoupled models to support the computation of resource contention effects. In ATD, accesses to shared resource are managed by Temporal Decoupled Semaphores (TDSems) which are integrated into the modeled shared resources. The set of TDSems assures the correct execution order of shared resource accesses and incorporates timing effects resulting from shared resource access execution and resource conflicts. This is done by dynamically varying the data granularity of resource accesses based on information gathered from the local time warp. ATD facilitates modeling of a wide range of resource and resource access properties like preemptable and non-preemptable accesses, synchronous and asynchronous accesses, multiport resources, dynamic access priorities, interacting and cascaded resources, and user specified schedulers prioritizing simultaneous resource accesses.
Transparent Transaction Level Modeling focuses on the efficient creation of virtual prototypes by reducing the implementation effort and consists of a library and a code generator. The TTLM library adds a layer of convenience functions to ATD comprising various application programming interfaces for inter module communication, virtual prototype configuration and run time information extraction. The TTLM generator is used to automatically generate the structural code of the virtual prototype from the formal hardware specification language IP-XACT.
The applicability and benefits of the presented techniques are demonstrated using an image processing centric automotive application. Compared to an existing cycle accurate SystemC model, the implementation effort can be reduced by approximately 50% using TTLM. Applying ATD, the simulation performance can be increased by a factor of up to five while retaining cycle accuracy.