Kaiserslautern - Fachbereich Informatik
Refine
Year of publication
- 2019 (24) (remove)
Document Type
- Doctoral Thesis (19)
- Article (5)
Has Fulltext
- yes (24)
Keywords
- Scientific Visualization (2)
- Topology (2)
- Uncertainty Visualization (2)
- Application Framework (1)
- Distributed system (1)
- Ensemble Visualization (1)
- Flow Visualization (1)
- Framework (1)
- HPC (1)
- Heterogeneous (1)
Faculty / Organisational entity
The usage of sensors in modern technical systems and consumer products is in a rapid increase. This advancement can be characterized by two major factors, namely, the mass introduction of consumer oriented sensing devices to the market and the sheer amount of sensor data being generated. These characteristics raise subsequent challenges regarding both the consumer sensing devices' reliability and the management and utilization of the generated sensor data. This thesis addresses these challenges through two main contributions. It presents a novel framework that leverages sentiment analysis techniques in order to assess the quality of consumer sensing devices. It also couples semantic technologies with big data technologies to present a new optimized approach for realization and management of semantic sensor data, hence providing a robust means of integration, analysis, and reuse of the generated data. The thesis also presents several applications that show the potential of the contributions in real-life scenarios.
Due to the broad range, growing feature set and fast release pace of new sensor-based products, evaluating these products is very challenging as standard product testing is not practical. As an alternative, an end-to-end aspect-based sentiment summarizer pipeline for evaluation of consumer sensing devices is presented. The pipeline uses product reviews to extract the sentiment at the aspect level and includes several components namely, product name extractor, aspects extractor and a lexicon-based sentiment extractor which handles multiple sentiment analysis challenges such as sentiment shifters, negations, and comparative sentences among others. The proposed summarizer's components generally outperform the state-of-the-art approaches. As a use case, features of the market leading fitness trackers are evaluated and a dynamic visual summarizer is presented to display the evaluation results and to provide personalized product recommendations for potential customers.
The increased usage of sensing devices in the consumer market is accompanied with increased deployment of sensors in various other fields such as industry, agriculture, and energy production systems. This necessitates using efficient and scalable methods for storing and processing of sensor data. Coupling big data technologies with semantic techniques not only helps to achieve the desired storage and processing goals, but also facilitates data integration, data analysis, and the utilization of data in unforeseen future applications through preserving the data generation context. This thesis proposes an efficient and scalable solution for semantification, storage and processing of raw sensor data through ontological modelling of sensor data and a novel encoding scheme that harnesses the split between the statements of the conceptual model of an ontology (TBox) and the individual facts (ABox) along with in-memory processing capabilities of modern big data systems. A sample use case is further introduced where a smartphone is deployed in a transportation bus to collect various sensor data which is then utilized in detecting street anomalies.
In addition to the aforementioned contributions, and to highlight the potential use cases of sensor data publicly available, a recommender system is developed using running route data, used for proximity-based retrieval, to provide personalized suggestions for new routes considering the runner's performance, visual and nature of route preferences.
This thesis aims at enhancing the integration of sensing devices in daily life applications through facilitating the public acquisition of consumer sensing devices. It also aims at achieving better integration and processing of sensor data in order to enable new potential usage scenarios of the raw generated data.
Large-scale distributed systems consist of a number of components, take a number of parameter values as input, and behave differently based on a number of non-deterministic events. All these features—components, parameter values, and events—interact in complicated ways, and unanticipated interactions may lead to bugs. Empirically, many bugs in these systems are caused by interactions of only a small number of features. In certain cases, it may be possible to test all interactions of \(k\) features for a small constant \(k\) by executing a family of tests that is exponentially or even doubly-exponentially smaller than the family of all tests. Thus, in such cases we can effectively uncover all bugs that require up to \(k\)-wise interactions of features.
In this thesis we study two occurrences of this phenomenon. First, many bugs in distributed systems are caused by network partition faults. In most cases these bugs occur due to two or three key nodes, such as leaders or replicas, not being able to communicate, or because the leading node finds itself in a block of the partition without quorum. Second, bugs may occur due to unexpected schedules (interleavings) of concurrent events—concurrent exchange of messages and concurrent access to shared resources. Again, many bugs depend only on the relative ordering of a small number of events. We call the smallest number of events whose ordering causes a bug the depth of the bug. We show that in both testing scenarios we can effectively uncover bugs involving small number of nodes or bugs of small depth by executing small families of tests.
We phrase both testing scenarios in terms of an abstract framework of tests, testing goals, and goal coverage. Sets of tests that cover all testing goals are called covering families. We give a general construction that shows that whenever a random test covers a fixed goal with sufficiently high probability, a small randomly chosen set of tests is a covering family with high probability. We then introduce concrete coverage notions relating to network partition faults and bugs of small depth. In case of network partition faults, we show that for the introduced coverage notions we can find a lower bound on the probability that a random test covers a given goal. Our general construction then yields a randomized testing procedure that achieves full coverage—and hence, find bugs—quickly.
In case of coverage notions related to bugs of small depth, if the events in the program form a non-trivial partial order, our general construction may give a suboptimal bound. Thus, we study other ways of constructing covering families. We show that if the events in a concurrent program are partially ordered as a tree, we can explicitly construct a covering family of small size: for balanced trees, our construction is polylogarithmic in the number of events. For the case when the partial order of events does not have a "nice" structure, and the events and their relation to previous events are revealed while the program is running, we give an online construction of covering families. Based on the construction, we develop a randomized scheduler called PCTCP that uniformly samples schedules from a covering family and has a rigorous guarantee of finding bugs of small depth. We experiment with an implementation of PCTCP on two real-world distributed systems—Zookeeper and Cassandra—and show that it can effectively find bugs.
In order to discuss the kinds of reasoning a visualization supports and the conclusions that can be drawn within the analysiscontext, a theoretical framework is needed that enables a formal treatment of the reasoning process. Such a model needs toencompass three stages of the visualization pipeline: encoding, decoding and interpretation. The encoding details how dataare transformed into a visualization and what can be seen in the visualization. The decoding explains how humans constructgraphical contexts inside the depicted visualization and how they interpret them assigning meaning to displayed structuresaccording to a formal reasoning strategy. In the presented model, we adapt and combine theories for the different steps intoa unified formal framework such that the analysis process is modelled as an assignment of meaning to displayed structuresaccording to a formal reasoning strategy. Additionally, we propose the ConceptGraph, a combined graph-based representationof the finite-state transducers resulting from the three stages, that can be used to formalize and understand the reasoning process.We apply the new model to several visualization types and investigate reasoning strategies for various tasks.
Under the notion of Cyber-Physical Systems an increasingly important research area has
evolved with the aim of improving the connectivity and interoperability of previously
separate system functions. Today, the advanced networking and processing capabilities
of embedded systems make it possible to establish strongly distributed, heterogeneous
systems of systems. In such configurations, the system boundary does not necessarily
end with the hardware, but can also take into account the wider context such as people
and environmental factors. In addition to being open and adaptive to other networked
systems at integration time, such systems need to be able to adapt themselves in accordance
with dynamic changes in their application environments. Considering that many
of the potential application domains are inherently safety-critical, it has to be ensured
that the necessary modifications in the individual system behavior are safe. However,
currently available state-of-the-practice and state-of-the-art approaches for safety assurance
and certification are not applicable to this context.
To provide a feasible solution approach, this thesis introduces a framework that allows
“just-in-time” safety certification for the dynamic adaptation behavior of networked
systems. Dynamic safety contracts (DSCs) are presented as the core solution concept
for monitoring and synthesis of decentralized safety knowledge. Ultimately, this opens
up a path towards standardized service provision concepts as a set of safety-related runtime
evidences. DSCs enable the modular specification of relevant safety features in
networked applications as a series of formalized demand-guarantee dependencies. The
specified safety features can be hierarchically integrated and linked to an interpretation
level for accessing the scope of possible safe behavioral adaptations. In this way, the networked
adaptation behavior can be conditionally certified with respect to the fulfilled
DSC safety features during operation. As long as the continuous evaluation process
provides safe adaptation behavior for a networked application context, safety can be
guaranteed for a networked system mode at runtime. Significant safety-related changes
in the application context, however, can lead to situations in which no safe adaptation
behavior is available for the current system state. In such cases, the remaining DSC
guarantees can be utilized to determine optimal degradation concepts for the dynamic
applications.
For the operationalization of the DSCs approach, suitable specification elements and
mechanisms have been defined. Based on a dedicated GUI-engineering framework it is
shown how DSCs can be systematically developed and transformed into appropriate runtime
representations. Furthermore, a safety engineering backbone is outlined to support
the DSC modeling process in concrete application scenarios. The conducted validation
activities show the feasibility and adequacy of the proposed DSCs approach. In parallel,
limitations and areas of future improvement are pointed out.
Shared memory concurrency is the pervasive programming model for multicore architectures
such as x86, Power, and ARM. Depending on the memory organization, each architecture follows
a somewhat different shared memory model. All these models, however, have one common
feature: they allow certain outcomes for concurrent programs that cannot be explained
by interleaving execution. In addition to the complexity due to architectures, compilers like
GCC and LLVM perform various program transformations, which also affect the outcomes of
concurrent programs.
To be able to program these systems correctly and effectively, it is important to define a
formal language-level concurrency model. For efficiency, it is important that the model is
weak enough to allow various compiler optimizations on shared memory accesses as well
as efficient mappings to the architectures. For programmability, the model should be strong
enough to disallow bogus “out-of-thin-air” executions and provide strong guarantees for well-synchronized
programs. Because of these conflicting requirements, defining such a formal
model is very difficult. This is why, despite years of research, major programming languages
such as C/C++ and Java do not yet have completely adequate formal models defining their
concurrency semantics.
In this thesis, we address this challenge and develop a formal concurrency model that is very
good both in terms of compilation efficiency and of programmability. Unlike most previous
approaches, which were defined either operationally or axiomatically on single executions,
our formal model is based on event structures, which represents multiple program executions,
and thus gives us more structure to define the semantics of concurrency.
In more detail, our formalization has two variants: the weaker version, WEAKEST, and the
stronger version, WEAKESTMO. The WEAKEST model simulates the promising semantics proposed
by Kang et al., while WEAKESTMO is incomparable to the promising semantics. Moreover,
WEAKESTMO discards certain questionable behaviors allowed by the promising semantics.
We show that the proposed WEAKESTMO model resolve out-of-thin-air problem, provide
standard data-race-freedom (DRF) guarantees, allow the desirable optimizations, and can be
mapped to the architectures like x86, PowerPC, and ARMv7. Additionally, our models are
flexible enough to leverage existing results from the literature to establish data-race-freedom
(DRF) guarantees and correctness of compilation.
In addition, in order to ensure the correctness of compilation by a major compiler, we developed
a translation validator targeting LLVM’s “opt” transformations of concurrent C/C++
programs. Using the validator, we identified a few subtle compilation bugs, which were reported
and were fixed. Additionally, we observe that LLVM concurrency semantics differs
from that of C11; there are transformations which are justified in C11 but not in LLVM and
vice versa. Considering the subtle aspects of LLVM concurrency, we formalized a fragment
of LLVM’s concurrency semantics and integrated it into our WEAKESTMO model.
Ranking lists are an essential methodology to succinctly summarize outstanding items, computed over database tables or crowdsourced in dedicated websites. In this thesis, we propose the usage of automatically generated, entity-centric rankings to discover insights in data. We present PALEO, a framework for data exploration through reverse engineering top-k database queries, that is, given a database and a sample top-k input list, our approach, aims at determining an SQL query that returns results similar to the provided input when executed over the database. The core problem consist of finding selection predicates that return the given items, determining the correct ranking criteria, and evaluating the most promising candidate queries first. PALEO operates on subset of the base data, uses data samples, histograms, descriptive statistics, and further proposes models that assess the suitability of candidate queries which facilitate limitation of false positives. Furthermore, this thesis presents COMPETE, a novel approach that models and computes dominance over user-provided input entities, given a database of top-k rankings. The resulting entities are found superior or inferior with tunable degree of dominance over the input set---a very intuitive, yet insightful way to explore pros and cons of entities of interest. Several notions of dominance are defined which differ in computational complexity and strictness of the dominance concept---yet, interdependent through containment relations. COMPETE is able to pick the most promising approach to satisfy a user request at minimal runtime latency, using a probabilistic model that is estimating the result sizes. The individual flavors of dominance are cast into a stack of algorithms over inverted indices and auxiliary structures, enabling pruning techniques to avoid significant data access over large datasets of rankings.
While the design step should be free from computational related constraints and operations due to its artistic aspect, the modeling phase has to prepare the model for the later stages of the pipeline.
This dissertation is concerned with the design and implementation of a framework for local remeshing and optimization. Based on the experience gathered, a full study about mesh quality criteria is also part of this work.
The contributions can be highlighted as: (1) a local meshing technique based on a completely novel approach constrained to the preservation of the mesh of non interesting areas. With this concept, designers can work on the design details of specific regions of the model without introducing more polygons elsewhere; (2) a tool capable of recovering the shape of a refined area to its decimated version, enabling details on optimized meshes of detailed models; (3) the integration of novel techniques into a single framework for meshing and smoothing which is constrained to surface structure; (4) the development of a mesh quality criteria priority structure, being able to classify and prioritize according to the application of the mesh.
Although efficient meshing techniques have been proposed along the years, most of them lack the possibility to mesh smaller regions of the base mesh, preserving the mesh quality and density of outer areas.
Considering this limitation, this dissertation seeks answers to the following research questions:
1. Given that mesh quality is relative to the application it is intended for, is it possible to design a general mesh evaluation plan?
2. How to prioritize specific mesh criteria over others?
3. Given an optimized mesh and its original design, how to improve the representation of single regions of the first, without degrading the mesh quality elsewhere?
Four main achievements came from the respective answers:
1. The Application Driven Mesh Quality Criteria Structure: Due to high variation in mesh standards because of various computer aided operations performed for different applications, e.g. animation or stress simulation, a structure for better visualization of mesh quality criteria is proposed. The criteria can be used to guide the mesh optimization, making the task consistent and reliable. This dissertation also proposes a methodology to optimize the criteria values, which is adaptable to the needs of a specific application.
2. Curvature Driven Meshing Algorithm: A novel approach, a local meshing technique, which works on a desired area of the mesh while preserving its boundaries as well as the rest of the topology. It causes a slow growth in the overall amount of polygons by making only small regions denser. The method can also be used to recover the details of a reference mesh to its decimated version while refining it. Moreover, it employs a geometric fast and easy to implement approach representing surface features as simple circles, being used to guide the meshing. It also generates quad-dominant meshes, with triangle count directly dependent on the size of the boundary.
3. Curvature-based Method for Anisotropic Mesh Smoothing: A geometric-based method is extended to 3D space to be able to produce anisotropic elements where needed. It is made possible by mapping the original space to another which embeds the surface curvature. This methodology is used to enhance the smoothing algorithm by making the nearly regularized elements follow the surface features, preserving the original design. The mesh optimization method also preserves mesh topology, while resizing elements according to the local mesh resolution, effectively enhancing the design aspects intended.
4. Framework for Local Restructure of Meshed Surfaces: The combination of both methods creates a complete tool for recovering surface details through mesh refinement and curvature aware mesh smoothing.
Novel image processing techniques have been in development for decades, but most
of these techniques are barely used in real world applications. This results in a gap
between image processing research and real-world applications; this thesis aims to
close this gap. In an initial study, the quantification, propagation, and communication
of uncertainty were determined to be key features in gaining acceptance for
new image processing techniques in applications.
This thesis presents a holistic approach based on a novel image processing pipeline,
capable of quantifying, propagating, and communicating image uncertainty. This
work provides an improved image data transformation paradigm, extending image
data using a flexible, high-dimensional uncertainty model. Based on this, a completely
redesigned image processing pipeline is presented. In this pipeline, each
step respects and preserves the underlying image uncertainty, allowing image uncertainty
quantification, image pre-processing, image segmentation, and geometry
extraction. This is communicated by utilizing meaningful visualization methodologies
throughout each computational step.
The presented methods are examined qualitatively by comparing to the Stateof-
the-Art, in addition to user evaluation in different domains. To show the applicability
of the presented approach to real world scenarios, this thesis demonstrates
domain-specific problems and the successful implementation of the presented techniques
in these domains.
The systems in industrial automation management (IAM) are information systems. The management parts of such systems are software components that support the manufacturing processes. The operational parts control highly plug-compatible devices, such as controllers, sensors and motors. Process variability and topology variability are the two main characteristics of software families in this domain. Furthermore, three roles of stakeholders -- requirement engineers, hardware-oriented engineers, and software developers -- participate in different derivation stages and have different variability concerns. In current practice, the development and reuse of such systems is costly and time-consuming, due to the complexity of topology and process variability. To overcome these challenges, the goal of this thesis is to develop an approach to improve the software product derivation process for systems in industrial automation management, where different variability types are concerned in different derivation stages. Current state-of-the-art approaches commonly use general-purpose variability modeling languages to represent variability, which is not sufficient for IAM systems. The process and topology variability requires more user-centered modeling and representation. The insufficiency of variability modeling leads to low efficiency during the staged derivation process involving different stakeholders. Up to now, product line approaches for systematic variability modeling and realization have not been well established for such complex domains. The model-based derivation approach presented in this thesis integrates feature modeling with domain-specific models for expressing processes and topology. The multi-variability modeling framework includes the meta-models of the three variability types and their associations. The realization and implementation of the multi-variability involves the mapping and the tracing of variants to their corresponding software product line assets. Based on the foundation of multi-variability modeling and realization, a derivation infrastructure is developed, which enables a semi-automated software derivation approach. It supports the configuration of different variability types to be integrated into the staged derivation process of the involved stakeholders. The derivation approach is evaluated in an industry-grade case study of a complex software system. The feasibility is demonstrated by applying the approach in the case study. By using the approach, both the size of the reusable core assets and the automation level of derivation are significantly improved. Furthermore, semi-structured interviews with engineers in practice have evaluated the usefulness and ease-of-use of the proposed approach. The results show a positive attitude towards applying the approach in practice, and high potential to generalize it to other related domains.
Most modern multiprocessors offer weak memory behavior to improve their performance in terms of throughput. They allow the order of memory operations to be observed differently by each processor. This is opposite to the concept of sequential consistency (SC) which enforces a unique sequential view on all operations for all processors. Because most software has been and still is developed with SC in mind, we face a gap between the expected behavior and the actual behavior on modern architectures. The issues described only affect multithreaded software and therefore most programmers might never face them. However, multi-threaded bare metal software like operating systems, embedded software, and real-time software have to consider memory consistency and ensure that the order of memory operations does not yield unexpected results. This software is more critical as general consumer software in terms of consequences, and therefore new methods are needed to ensure their correct behavior.
In general, a memory system is considered weak if it allows behavior that is not possible in a sequential system. For example, in the SPARC processor with total store ordering (TSO) consistency, all writes might be delayed by store buffers before they eventually are processed by the main memory. This allows the issuing process to work with its own written values before other processes observed them (i.e., reading its own value before it leaves the store buffer). Because this behavior is not possible with sequential consistency, TSO is considered to be weaker than SC. Programming in the context of weak memory architectures requires a proper comprehension of how the model deviates from expected sequential behavior. For verification of these programs formal representations are required that cover the weak behavior in order to utilize formal verification tools.
This thesis explores different verification approaches and respectively fitting representations of a multitude of memory models. In a joint effort, we started with the concept of testing memory operation traces in regard of their consistency with different memory consistency models. A memory operation trace is directly derived from a program trace and consists of a sequence of read and write operations for each process. Analyzing the testing problem, we are able to prove that the problem is NP-complete for most memory models. In that process, a satisfiability (SAT) encoding for given problem instances was developed, that can be used in reachability and robustness analysis.
In order to cover all program executions instead of just a single program trace, additional representations are introduced and explored throughout this thesis. One of the representations introduced is a novel approach to specify a weak memory system using temporal logics. A set of linear temporal logic (LTL) formulas is developed that describes all properties required to restrict possible traces to those consistent to the given memory model. The resulting LTL specifications can directly be used in model checking, e.g., to check safety conditions. Unfortunately, the derived LTL specifications suffer from the state explosion problem: Even small examples, like the Peterson mutual exclusion algorithm, tend to generate huge formulas and require vast amounts of memory for verification. For this reason, it is concluded that using the proposed verification approach these specifications are not well suited for verification of real world software. Nonetheless, they provide comprehensive and formally correct descriptions that might be used elsewhere, e.g., programming or teaching.
Another approach to represent these models are operational semantics. In this thesis, operational semantics of weak memory models are provided in the form of reference machines that are both correct and complete regarding the memory model specification. Operational semantics allow to simulate systems with weak memory models step by step. This provides an elegant way to study the effects that lead to weak consistent behavior, while still providing a basis for formal verification. The operational models are then incorporated in verification tools for multithreaded software. These state space exploration tools proved suitable for verification of multithreaded software in a weak consistent memory environment. However, because not only the memory system but also the processor are expressed as operational semantics, some verification approach will not be feasible due to the large size of the state space.
Finally, to tackle the beforementioned issue, a state transition system for parallel programs is proposed. The transition system is defined by a set of structural operational semantics (SOS) rules and a suitable memory structure that can cover multiple memory models. This allows to influence the state space by use of smart representations and approximation approaches in future work.