Kaiserslautern - Fachbereich Informatik
Refine
Year of publication
- 2019 (24) (remove)
Document Type
- Doctoral Thesis (19)
- Article (5)
Has Fulltext
- yes (24)
Keywords
- Scientific Visualization (2)
- Topology (2)
- Uncertainty Visualization (2)
- Application Framework (1)
- Distributed system (1)
- Ensemble Visualization (1)
- Flow Visualization (1)
- Framework (1)
- HPC (1)
- Heterogeneous (1)
Faculty / Organisational entity
The usage of sensors in modern technical systems and consumer products is in a rapid increase. This advancement can be characterized by two major factors, namely, the mass introduction of consumer oriented sensing devices to the market and the sheer amount of sensor data being generated. These characteristics raise subsequent challenges regarding both the consumer sensing devices' reliability and the management and utilization of the generated sensor data. This thesis addresses these challenges through two main contributions. It presents a novel framework that leverages sentiment analysis techniques in order to assess the quality of consumer sensing devices. It also couples semantic technologies with big data technologies to present a new optimized approach for realization and management of semantic sensor data, hence providing a robust means of integration, analysis, and reuse of the generated data. The thesis also presents several applications that show the potential of the contributions in real-life scenarios.
Due to the broad range, growing feature set and fast release pace of new sensor-based products, evaluating these products is very challenging as standard product testing is not practical. As an alternative, an end-to-end aspect-based sentiment summarizer pipeline for evaluation of consumer sensing devices is presented. The pipeline uses product reviews to extract the sentiment at the aspect level and includes several components namely, product name extractor, aspects extractor and a lexicon-based sentiment extractor which handles multiple sentiment analysis challenges such as sentiment shifters, negations, and comparative sentences among others. The proposed summarizer's components generally outperform the state-of-the-art approaches. As a use case, features of the market leading fitness trackers are evaluated and a dynamic visual summarizer is presented to display the evaluation results and to provide personalized product recommendations for potential customers.
The increased usage of sensing devices in the consumer market is accompanied with increased deployment of sensors in various other fields such as industry, agriculture, and energy production systems. This necessitates using efficient and scalable methods for storing and processing of sensor data. Coupling big data technologies with semantic techniques not only helps to achieve the desired storage and processing goals, but also facilitates data integration, data analysis, and the utilization of data in unforeseen future applications through preserving the data generation context. This thesis proposes an efficient and scalable solution for semantification, storage and processing of raw sensor data through ontological modelling of sensor data and a novel encoding scheme that harnesses the split between the statements of the conceptual model of an ontology (TBox) and the individual facts (ABox) along with in-memory processing capabilities of modern big data systems. A sample use case is further introduced where a smartphone is deployed in a transportation bus to collect various sensor data which is then utilized in detecting street anomalies.
In addition to the aforementioned contributions, and to highlight the potential use cases of sensor data publicly available, a recommender system is developed using running route data, used for proximity-based retrieval, to provide personalized suggestions for new routes considering the runner's performance, visual and nature of route preferences.
This thesis aims at enhancing the integration of sensing devices in daily life applications through facilitating the public acquisition of consumer sensing devices. It also aims at achieving better integration and processing of sensor data in order to enable new potential usage scenarios of the raw generated data.
Large-scale distributed systems consist of a number of components, take a number of parameter values as input, and behave differently based on a number of non-deterministic events. All these features—components, parameter values, and events—interact in complicated ways, and unanticipated interactions may lead to bugs. Empirically, many bugs in these systems are caused by interactions of only a small number of features. In certain cases, it may be possible to test all interactions of \(k\) features for a small constant \(k\) by executing a family of tests that is exponentially or even doubly-exponentially smaller than the family of all tests. Thus, in such cases we can effectively uncover all bugs that require up to \(k\)-wise interactions of features.
In this thesis we study two occurrences of this phenomenon. First, many bugs in distributed systems are caused by network partition faults. In most cases these bugs occur due to two or three key nodes, such as leaders or replicas, not being able to communicate, or because the leading node finds itself in a block of the partition without quorum. Second, bugs may occur due to unexpected schedules (interleavings) of concurrent events—concurrent exchange of messages and concurrent access to shared resources. Again, many bugs depend only on the relative ordering of a small number of events. We call the smallest number of events whose ordering causes a bug the depth of the bug. We show that in both testing scenarios we can effectively uncover bugs involving small number of nodes or bugs of small depth by executing small families of tests.
We phrase both testing scenarios in terms of an abstract framework of tests, testing goals, and goal coverage. Sets of tests that cover all testing goals are called covering families. We give a general construction that shows that whenever a random test covers a fixed goal with sufficiently high probability, a small randomly chosen set of tests is a covering family with high probability. We then introduce concrete coverage notions relating to network partition faults and bugs of small depth. In case of network partition faults, we show that for the introduced coverage notions we can find a lower bound on the probability that a random test covers a given goal. Our general construction then yields a randomized testing procedure that achieves full coverage—and hence, find bugs—quickly.
In case of coverage notions related to bugs of small depth, if the events in the program form a non-trivial partial order, our general construction may give a suboptimal bound. Thus, we study other ways of constructing covering families. We show that if the events in a concurrent program are partially ordered as a tree, we can explicitly construct a covering family of small size: for balanced trees, our construction is polylogarithmic in the number of events. For the case when the partial order of events does not have a "nice" structure, and the events and their relation to previous events are revealed while the program is running, we give an online construction of covering families. Based on the construction, we develop a randomized scheduler called PCTCP that uniformly samples schedules from a covering family and has a rigorous guarantee of finding bugs of small depth. We experiment with an implementation of PCTCP on two real-world distributed systems—Zookeeper and Cassandra—and show that it can effectively find bugs.
In order to discuss the kinds of reasoning a visualization supports and the conclusions that can be drawn within the analysiscontext, a theoretical framework is needed that enables a formal treatment of the reasoning process. Such a model needs toencompass three stages of the visualization pipeline: encoding, decoding and interpretation. The encoding details how dataare transformed into a visualization and what can be seen in the visualization. The decoding explains how humans constructgraphical contexts inside the depicted visualization and how they interpret them assigning meaning to displayed structuresaccording to a formal reasoning strategy. In the presented model, we adapt and combine theories for the different steps intoa unified formal framework such that the analysis process is modelled as an assignment of meaning to displayed structuresaccording to a formal reasoning strategy. Additionally, we propose the ConceptGraph, a combined graph-based representationof the finite-state transducers resulting from the three stages, that can be used to formalize and understand the reasoning process.We apply the new model to several visualization types and investigate reasoning strategies for various tasks.
Under the notion of Cyber-Physical Systems an increasingly important research area has
evolved with the aim of improving the connectivity and interoperability of previously
separate system functions. Today, the advanced networking and processing capabilities
of embedded systems make it possible to establish strongly distributed, heterogeneous
systems of systems. In such configurations, the system boundary does not necessarily
end with the hardware, but can also take into account the wider context such as people
and environmental factors. In addition to being open and adaptive to other networked
systems at integration time, such systems need to be able to adapt themselves in accordance
with dynamic changes in their application environments. Considering that many
of the potential application domains are inherently safety-critical, it has to be ensured
that the necessary modifications in the individual system behavior are safe. However,
currently available state-of-the-practice and state-of-the-art approaches for safety assurance
and certification are not applicable to this context.
To provide a feasible solution approach, this thesis introduces a framework that allows
“just-in-time” safety certification for the dynamic adaptation behavior of networked
systems. Dynamic safety contracts (DSCs) are presented as the core solution concept
for monitoring and synthesis of decentralized safety knowledge. Ultimately, this opens
up a path towards standardized service provision concepts as a set of safety-related runtime
evidences. DSCs enable the modular specification of relevant safety features in
networked applications as a series of formalized demand-guarantee dependencies. The
specified safety features can be hierarchically integrated and linked to an interpretation
level for accessing the scope of possible safe behavioral adaptations. In this way, the networked
adaptation behavior can be conditionally certified with respect to the fulfilled
DSC safety features during operation. As long as the continuous evaluation process
provides safe adaptation behavior for a networked application context, safety can be
guaranteed for a networked system mode at runtime. Significant safety-related changes
in the application context, however, can lead to situations in which no safe adaptation
behavior is available for the current system state. In such cases, the remaining DSC
guarantees can be utilized to determine optimal degradation concepts for the dynamic
applications.
For the operationalization of the DSCs approach, suitable specification elements and
mechanisms have been defined. Based on a dedicated GUI-engineering framework it is
shown how DSCs can be systematically developed and transformed into appropriate runtime
representations. Furthermore, a safety engineering backbone is outlined to support
the DSC modeling process in concrete application scenarios. The conducted validation
activities show the feasibility and adequacy of the proposed DSCs approach. In parallel,
limitations and areas of future improvement are pointed out.
Shared memory concurrency is the pervasive programming model for multicore architectures
such as x86, Power, and ARM. Depending on the memory organization, each architecture follows
a somewhat different shared memory model. All these models, however, have one common
feature: they allow certain outcomes for concurrent programs that cannot be explained
by interleaving execution. In addition to the complexity due to architectures, compilers like
GCC and LLVM perform various program transformations, which also affect the outcomes of
concurrent programs.
To be able to program these systems correctly and effectively, it is important to define a
formal language-level concurrency model. For efficiency, it is important that the model is
weak enough to allow various compiler optimizations on shared memory accesses as well
as efficient mappings to the architectures. For programmability, the model should be strong
enough to disallow bogus “out-of-thin-air” executions and provide strong guarantees for well-synchronized
programs. Because of these conflicting requirements, defining such a formal
model is very difficult. This is why, despite years of research, major programming languages
such as C/C++ and Java do not yet have completely adequate formal models defining their
concurrency semantics.
In this thesis, we address this challenge and develop a formal concurrency model that is very
good both in terms of compilation efficiency and of programmability. Unlike most previous
approaches, which were defined either operationally or axiomatically on single executions,
our formal model is based on event structures, which represents multiple program executions,
and thus gives us more structure to define the semantics of concurrency.
In more detail, our formalization has two variants: the weaker version, WEAKEST, and the
stronger version, WEAKESTMO. The WEAKEST model simulates the promising semantics proposed
by Kang et al., while WEAKESTMO is incomparable to the promising semantics. Moreover,
WEAKESTMO discards certain questionable behaviors allowed by the promising semantics.
We show that the proposed WEAKESTMO model resolve out-of-thin-air problem, provide
standard data-race-freedom (DRF) guarantees, allow the desirable optimizations, and can be
mapped to the architectures like x86, PowerPC, and ARMv7. Additionally, our models are
flexible enough to leverage existing results from the literature to establish data-race-freedom
(DRF) guarantees and correctness of compilation.
In addition, in order to ensure the correctness of compilation by a major compiler, we developed
a translation validator targeting LLVM’s “opt” transformations of concurrent C/C++
programs. Using the validator, we identified a few subtle compilation bugs, which were reported
and were fixed. Additionally, we observe that LLVM concurrency semantics differs
from that of C11; there are transformations which are justified in C11 but not in LLVM and
vice versa. Considering the subtle aspects of LLVM concurrency, we formalized a fragment
of LLVM’s concurrency semantics and integrated it into our WEAKESTMO model.
Ranking lists are an essential methodology to succinctly summarize outstanding items, computed over database tables or crowdsourced in dedicated websites. In this thesis, we propose the usage of automatically generated, entity-centric rankings to discover insights in data. We present PALEO, a framework for data exploration through reverse engineering top-k database queries, that is, given a database and a sample top-k input list, our approach, aims at determining an SQL query that returns results similar to the provided input when executed over the database. The core problem consist of finding selection predicates that return the given items, determining the correct ranking criteria, and evaluating the most promising candidate queries first. PALEO operates on subset of the base data, uses data samples, histograms, descriptive statistics, and further proposes models that assess the suitability of candidate queries which facilitate limitation of false positives. Furthermore, this thesis presents COMPETE, a novel approach that models and computes dominance over user-provided input entities, given a database of top-k rankings. The resulting entities are found superior or inferior with tunable degree of dominance over the input set---a very intuitive, yet insightful way to explore pros and cons of entities of interest. Several notions of dominance are defined which differ in computational complexity and strictness of the dominance concept---yet, interdependent through containment relations. COMPETE is able to pick the most promising approach to satisfy a user request at minimal runtime latency, using a probabilistic model that is estimating the result sizes. The individual flavors of dominance are cast into a stack of algorithms over inverted indices and auxiliary structures, enabling pruning techniques to avoid significant data access over large datasets of rankings.
While the design step should be free from computational related constraints and operations due to its artistic aspect, the modeling phase has to prepare the model for the later stages of the pipeline.
This dissertation is concerned with the design and implementation of a framework for local remeshing and optimization. Based on the experience gathered, a full study about mesh quality criteria is also part of this work.
The contributions can be highlighted as: (1) a local meshing technique based on a completely novel approach constrained to the preservation of the mesh of non interesting areas. With this concept, designers can work on the design details of specific regions of the model without introducing more polygons elsewhere; (2) a tool capable of recovering the shape of a refined area to its decimated version, enabling details on optimized meshes of detailed models; (3) the integration of novel techniques into a single framework for meshing and smoothing which is constrained to surface structure; (4) the development of a mesh quality criteria priority structure, being able to classify and prioritize according to the application of the mesh.
Although efficient meshing techniques have been proposed along the years, most of them lack the possibility to mesh smaller regions of the base mesh, preserving the mesh quality and density of outer areas.
Considering this limitation, this dissertation seeks answers to the following research questions:
1. Given that mesh quality is relative to the application it is intended for, is it possible to design a general mesh evaluation plan?
2. How to prioritize specific mesh criteria over others?
3. Given an optimized mesh and its original design, how to improve the representation of single regions of the first, without degrading the mesh quality elsewhere?
Four main achievements came from the respective answers:
1. The Application Driven Mesh Quality Criteria Structure: Due to high variation in mesh standards because of various computer aided operations performed for different applications, e.g. animation or stress simulation, a structure for better visualization of mesh quality criteria is proposed. The criteria can be used to guide the mesh optimization, making the task consistent and reliable. This dissertation also proposes a methodology to optimize the criteria values, which is adaptable to the needs of a specific application.
2. Curvature Driven Meshing Algorithm: A novel approach, a local meshing technique, which works on a desired area of the mesh while preserving its boundaries as well as the rest of the topology. It causes a slow growth in the overall amount of polygons by making only small regions denser. The method can also be used to recover the details of a reference mesh to its decimated version while refining it. Moreover, it employs a geometric fast and easy to implement approach representing surface features as simple circles, being used to guide the meshing. It also generates quad-dominant meshes, with triangle count directly dependent on the size of the boundary.
3. Curvature-based Method for Anisotropic Mesh Smoothing: A geometric-based method is extended to 3D space to be able to produce anisotropic elements where needed. It is made possible by mapping the original space to another which embeds the surface curvature. This methodology is used to enhance the smoothing algorithm by making the nearly regularized elements follow the surface features, preserving the original design. The mesh optimization method also preserves mesh topology, while resizing elements according to the local mesh resolution, effectively enhancing the design aspects intended.
4. Framework for Local Restructure of Meshed Surfaces: The combination of both methods creates a complete tool for recovering surface details through mesh refinement and curvature aware mesh smoothing.
Novel image processing techniques have been in development for decades, but most
of these techniques are barely used in real world applications. This results in a gap
between image processing research and real-world applications; this thesis aims to
close this gap. In an initial study, the quantification, propagation, and communication
of uncertainty were determined to be key features in gaining acceptance for
new image processing techniques in applications.
This thesis presents a holistic approach based on a novel image processing pipeline,
capable of quantifying, propagating, and communicating image uncertainty. This
work provides an improved image data transformation paradigm, extending image
data using a flexible, high-dimensional uncertainty model. Based on this, a completely
redesigned image processing pipeline is presented. In this pipeline, each
step respects and preserves the underlying image uncertainty, allowing image uncertainty
quantification, image pre-processing, image segmentation, and geometry
extraction. This is communicated by utilizing meaningful visualization methodologies
throughout each computational step.
The presented methods are examined qualitatively by comparing to the Stateof-
the-Art, in addition to user evaluation in different domains. To show the applicability
of the presented approach to real world scenarios, this thesis demonstrates
domain-specific problems and the successful implementation of the presented techniques
in these domains.
The systems in industrial automation management (IAM) are information systems. The management parts of such systems are software components that support the manufacturing processes. The operational parts control highly plug-compatible devices, such as controllers, sensors and motors. Process variability and topology variability are the two main characteristics of software families in this domain. Furthermore, three roles of stakeholders -- requirement engineers, hardware-oriented engineers, and software developers -- participate in different derivation stages and have different variability concerns. In current practice, the development and reuse of such systems is costly and time-consuming, due to the complexity of topology and process variability. To overcome these challenges, the goal of this thesis is to develop an approach to improve the software product derivation process for systems in industrial automation management, where different variability types are concerned in different derivation stages. Current state-of-the-art approaches commonly use general-purpose variability modeling languages to represent variability, which is not sufficient for IAM systems. The process and topology variability requires more user-centered modeling and representation. The insufficiency of variability modeling leads to low efficiency during the staged derivation process involving different stakeholders. Up to now, product line approaches for systematic variability modeling and realization have not been well established for such complex domains. The model-based derivation approach presented in this thesis integrates feature modeling with domain-specific models for expressing processes and topology. The multi-variability modeling framework includes the meta-models of the three variability types and their associations. The realization and implementation of the multi-variability involves the mapping and the tracing of variants to their corresponding software product line assets. Based on the foundation of multi-variability modeling and realization, a derivation infrastructure is developed, which enables a semi-automated software derivation approach. It supports the configuration of different variability types to be integrated into the staged derivation process of the involved stakeholders. The derivation approach is evaluated in an industry-grade case study of a complex software system. The feasibility is demonstrated by applying the approach in the case study. By using the approach, both the size of the reusable core assets and the automation level of derivation are significantly improved. Furthermore, semi-structured interviews with engineers in practice have evaluated the usefulness and ease-of-use of the proposed approach. The results show a positive attitude towards applying the approach in practice, and high potential to generalize it to other related domains.
Most modern multiprocessors offer weak memory behavior to improve their performance in terms of throughput. They allow the order of memory operations to be observed differently by each processor. This is opposite to the concept of sequential consistency (SC) which enforces a unique sequential view on all operations for all processors. Because most software has been and still is developed with SC in mind, we face a gap between the expected behavior and the actual behavior on modern architectures. The issues described only affect multithreaded software and therefore most programmers might never face them. However, multi-threaded bare metal software like operating systems, embedded software, and real-time software have to consider memory consistency and ensure that the order of memory operations does not yield unexpected results. This software is more critical as general consumer software in terms of consequences, and therefore new methods are needed to ensure their correct behavior.
In general, a memory system is considered weak if it allows behavior that is not possible in a sequential system. For example, in the SPARC processor with total store ordering (TSO) consistency, all writes might be delayed by store buffers before they eventually are processed by the main memory. This allows the issuing process to work with its own written values before other processes observed them (i.e., reading its own value before it leaves the store buffer). Because this behavior is not possible with sequential consistency, TSO is considered to be weaker than SC. Programming in the context of weak memory architectures requires a proper comprehension of how the model deviates from expected sequential behavior. For verification of these programs formal representations are required that cover the weak behavior in order to utilize formal verification tools.
This thesis explores different verification approaches and respectively fitting representations of a multitude of memory models. In a joint effort, we started with the concept of testing memory operation traces in regard of their consistency with different memory consistency models. A memory operation trace is directly derived from a program trace and consists of a sequence of read and write operations for each process. Analyzing the testing problem, we are able to prove that the problem is NP-complete for most memory models. In that process, a satisfiability (SAT) encoding for given problem instances was developed, that can be used in reachability and robustness analysis.
In order to cover all program executions instead of just a single program trace, additional representations are introduced and explored throughout this thesis. One of the representations introduced is a novel approach to specify a weak memory system using temporal logics. A set of linear temporal logic (LTL) formulas is developed that describes all properties required to restrict possible traces to those consistent to the given memory model. The resulting LTL specifications can directly be used in model checking, e.g., to check safety conditions. Unfortunately, the derived LTL specifications suffer from the state explosion problem: Even small examples, like the Peterson mutual exclusion algorithm, tend to generate huge formulas and require vast amounts of memory for verification. For this reason, it is concluded that using the proposed verification approach these specifications are not well suited for verification of real world software. Nonetheless, they provide comprehensive and formally correct descriptions that might be used elsewhere, e.g., programming or teaching.
Another approach to represent these models are operational semantics. In this thesis, operational semantics of weak memory models are provided in the form of reference machines that are both correct and complete regarding the memory model specification. Operational semantics allow to simulate systems with weak memory models step by step. This provides an elegant way to study the effects that lead to weak consistent behavior, while still providing a basis for formal verification. The operational models are then incorporated in verification tools for multithreaded software. These state space exploration tools proved suitable for verification of multithreaded software in a weak consistent memory environment. However, because not only the memory system but also the processor are expressed as operational semantics, some verification approach will not be feasible due to the large size of the state space.
Finally, to tackle the beforementioned issue, a state transition system for parallel programs is proposed. The transition system is defined by a set of structural operational semantics (SOS) rules and a suitable memory structure that can cover multiple memory models. This allows to influence the state space by use of smart representations and approximation approaches in future work.
In computer graphics, realistic rendering of virtual scenes is a computationally complex problem. State-of-the-art rendering technology must become more scalable to
meet the performance requirements for demanding real-time applications.
This dissertation is concerned with core algorithms for rendering, focusing on the
ray tracing method in particular, to support and saturate recent massively parallel computer systems, i.e., to distribute the complex computations very efficiently
among a large number of processing elements. More specifically, the three targeted
main contributions are:
1. Collaboration framework for large-scale distributed memory computers
The purpose of the collaboration framework is to enable scalable rendering
in real-time on a distributed memory computer. As an infrastructure layer it
manages the explicit communication within a network of distributed memory
nodes transparently for the rendering application. The research is focused on
designing a communication protocol resilient against delays and negligible in
overhead, relying exclusively on one-sided and asynchronous data transfers.
The hypothesis is that a loosely coupled system like this is able to scale linearly
with the number of nodes, which is tested by directly measuring all possible
communication-induced delays as well as the overall rendering throughput.
2. Ray tracing algorithms designed for vector processing
Vector processors are to be efficiently utilized for improved ray tracing performance. This requires the basic, scalar traversal algorithm to be reformulated
in order to expose a high degree of fine-grained data parallelism. Two approaches are investigated: traversing multiple rays simultaneously, and performing
multiple traversal steps at once. Efficiently establishing coherence in a group
of rays as well as avoiding sorting of the nodes in a multi-traversal step are the
defining research goals.
3. Multi-threaded schedule and memory management for the ray tracing acceleration structure
Construction times of high-quality acceleration structures are to be reduced by
improvements to multi-threaded scalability and utilization of vector processors. Research is directed at eliminating the following scalability bottlenecks:
dynamic memory growth caused by the primitive splits required for high-
quality structures, and top-level hierarchy construction where simple task par-
allelism is not readily available. Additional research addresses how to expose
scatter/gather-free data-parallelism for efficient vector processing.
Together, these contributions form a scalable, high-performance basis for real-time,
ray tracing-based rendering, and a prototype path tracing application implemented
on top of this basis serves as a demonstration.
The key insight driving this dissertation is that the computational power necessary
for realistic light transport for real-time rendering applications demands massively
parallel computers, which in turn require highly scalable algorithms. Therefore this
dissertation provides important research along the path towards virtual reality.
In this thesis, we consider the problem of processing similarity queries over a dataset of top-k rankings and class constrained objects. Top-k rankings are the most natural and widely used technique to compress a large amount of information into a concise form. Spearman’s Footrule distance is used to compute the similarity between rankings, considering how well rankings agree on the positions (ranks) of ranked items. This setup allows the application of metric distance-based pruning strategies, and, alternatively, enables the use of traditional inverted indices for retrieving rankings that overlap in items. Although both techniques can be individually applied, we hypothesize that blending these two would lead to better performance. First, we formulate theoretical bounds over the rankings, based on Spearman's Footrule distance, which are essential for adapting existing, inverted index based techniques to the setting of top-k rankings. Further, we propose a hybrid indexing strategy, designed for efficiently processing similarity range queries, which incorporates inverted indices and metric space indices, such as M- or BK-trees, resulting in a structure that resembles both indexing methods with tunable emphasis on one or the other. Moreover, optimizations to the inverted index component are presented, for early termination and minimizing bookkeeping. As vast amounts of data are being generated on a daily bases, we further present a distributed, highly tunable, approach, implemented in Apache Spark, for efficiently processing similarity join queries over top-k rankings. To combine distance-based filtering with inverted indices, the algorithm works in several phases. The partial results are joined for the computation of the final result set. As the last contribution of the thesis, we consider processing k-nearest-neighbor (k-NN) queries over class-constrained objects, with the additional requirement that the result objects are of a specific type. We introduce the MISP index, which first indexes the objects by their (combination of) class belonging, followed by a similarity search sub index for each subset of objects. The number of such subsets can combinatorially explode, thus, we provide a cost model that analyzes the performance of the MISP index structure under different configurations, with the aim of finding the most efficient one for the dataset being searched.
IoT systems consist of Hardware/Software systems (e.g., sensors) that are embedded in a physical world, networked and that interact with complex software platforms. The validation of such systems is a challenge and currently mostly done by prototypes. This paper presents the virtual environment for simulation, emulation and validation of an IoT platform and its semantic model in real life scenarios. It is based on a decentralized, bottom up approach that offers interoperability of IoT devices and the value-added services they want to use across different domains. The framework is demonstrated by a comprehensive case study. The example consists of the complete IoT “Smart Energy” use case with focus on data privacy by homomorphic encryption. The performance of the network is compared while using partially homomorphic encryption, fully homomorphic encryption and no encryption at all.As a major result, we found that our framework is capable of simulating big IoT networks and the overhead introduced by homomorphic encryption is feasible for VICINITY.
Planar force or pressure is a fundamental physical aspect during any people-vs-people and people-vs-environment activities and interactions. It is as significant as the more established linear and angular acceleration (usually acquired by inertial measurement units). There have been several studies involving planar pressure in the discipline of activity recognition, as reviewed in the first chapter. These studies have shown that planar pressure is a promising sensing modality for activity recognition. However, they still take a niche part in the entire discipline, using ad hoc systems and data analysis methods. Mostly these studies were not followed by further elaborative works. The situation calls for a general framework that can help push planar pressure sensing into the mainstream.
This dissertation systematically investigates using planar pressure distribution sensing technology for ubiquitous and wearable activity recognition purposes. We propose a generic Textile Pressure Mapping (TPM) Framework, which encapsulates (1) design knowledge and guidelines, (2) a multi-layered tool including hardware, software and algorithms, and (3) an ensemble of empirical study examples. Through validation with various empirical studies, the unified TPM framework covers the full scope of application recognition, including the ambient, object, and wearable subspaces.
The hardware part constructs a general architecture and implementations in the large-scale and mobile directions separately. The software toolkit consists of four heterogeneous tiers: driver, data processing, machine learning, visualization/feedback. The algorithm chapter describes generic data processing techniques and a unified TPM feature set. The TPM framework offers a universal solution for other researchers and developers to evaluate TPM sensing modality in their application scenarios.
The significant findings from the empirical studies have shown that TPM is a versatile sensing modality. Specifically, in the ambient subspace, a sports mat or carpet with TPM sensors embedded underneath can distinguish different sports activities or different people's gait based on the dynamic change of body-print; a pressure sensitive tablecloth can detect various dining actions by the force propagated from the cutlery through the plates to the tabletop. In the object subspace, swirl office chairs with TPM sensors under the cover can be used to detect the seater's real-time posture; TPM can be used to detect emotion-related touch interactions for smart objects, toys or robots. In the wearable subspace, TPM sensors can be used to perform pressure-based mechanomyography to detect muscle and body movement; it can also be tailored to cover the surface of a soccer shoe to distinguish different kicking angles and intensities.
All the empirical evaluations have resulted in accuracies well-above the chance level of the corresponding number of classes, e.g., the `swirl chair' study has classification accuracy of 79.5% out of 10 posture classes and in the `soccer shoe' study the accuracy is 98.8% among 17 combinations of angle and intensity.
Topology-Based Characterization and Visual Analysis of Feature Evolution in Large-Scale Simulations
(2019)
This manuscript presents a topology-based analysis and visualization framework that enables the effective exploration of feature evolution in large-scale simulations. Such simulations pose additional challenges to the already complex task of feature tracking and visualization, since the vast number of features and the size of the simulation data make it infeasible to naively identify, track, analyze, render, store, and interact with data. The presented methodology addresses these issues via three core contributions. First, the manuscript defines a novel topological abstraction, called the Nested Tracking Graph (NTG), that records the temporal evolution of features that exhibit a nesting hierarchy, such as superlevel set components for multiple levels, or filtered features across multiple thresholds. In contrast to common tracking graphs that are only capable of describing feature evolution at one hierarchy level, NTGs effectively summarize their evolution across all hierarchy levels in one compact visualization. The second core contribution is a view-approximation oriented image database generation approach (VOIDGA) that stores, at simulation runtime, a reduced set of feature images. Instead of storing the features themselves---which is often infeasable due to bandwidth constraints---the images of these databases can be used to approximate the depicted features from any view angle within an acceptable visual error, which requires far less disk space and only introduces a neglectable overhead. The final core contribution combines these approaches into a methodology that stores in situ the least amount of information necessary to support flexible post hoc analysis utilizing NTGs and view approximation techniques.
Patients after total hip arthroplasty (THA) suffer from lingering musculoskeletal restrictions. Three-dimensional (3D) gait analysis in combination with machine-learning approaches is used to detect these impairments. In this work, features from the 3D gait kinematics, spatio temporal parameters (Set 1) and joint angles (Set 2), of an inertial sensor (IMU) system are proposed as an input for a support vector machine (SVM) model, to differentiate impaired and non-impaired gait. The features were divided into two subsets. The IMU-based features were validated against an optical motion capture (OMC) system by means of 20 patients after THA and a healthy control group of 24 subjects. Then the SVM model was trained on both subsets. The validation of the IMU system-based kinematic features revealed root mean squared errors in the joint kinematics from 0.24° to 1.25°. The validity of the spatio-temporal gait parameters (STP) revealed a similarly high accuracy. The SVM models based on IMU data showed an accuracy of 87.2% (Set 1) and 97.0% (Set 2). The current work presents valid IMU-based features, employed in an SVM model for the classification of the gait of patients after THA and a healthy control. The study reveals that the features of Set 2 are more significant concerning the classification problem. The present IMU system proves its potential to provide accurate features for the incorporation in a mobile gait-feedback system for patients after THA.
Visualization is vital to the scientific discovery process.
An interactive high-fidelity rendering provides accelerated insight into complex structures, models and relationships.
However, the efficient mapping of visualization tasks to high performance architectures is often difficult, being subject to a challenging mixture of hardware and software architectural complexities in combination with domain-specific hurdles.
These difficulties are often exacerbated on heterogeneous architectures.
In this thesis, a variety of ray casting-based techniques are developed and investigated with respect to a more efficient usage of heterogeneous HPC systems for distributed visualization, addressing challenges in mesh-free rendering, in-situ compression, task-based workload formulation, and remote visualization at large scale.
A novel direct raytracing scheme for on-the-fly free surface reconstruction of particle-based simulations using an extended anisoptropic kernel model is investigated on different state-of-the-art cluster setups.
The versatile system renders up to 170 million particles on 32 distributed compute nodes at close to interactive frame rates at 4K resolution with ambient occlusion.
To address the widening gap between high computational throughput and prohibitively slow I/O subsystems, in situ topological contour tree analysis is combined with a compact image-based data representation to provide an effective and easy-to-control trade-off between storage overhead and visualization fidelity.
Experiments show significant reductions in storage requirements, while preserving flexibility for exploration and analysis.
Driven by an increasingly heterogeneous system landscape, a flexible distributed direct volume rendering and hybrid compositing framework is presented.
Based on a task-based dynamic runtime environment, it enables adaptable performance-oriented deployment on various platform configurations.
Comprehensive benchmarks with respect to task granularity and scaling are conducted to verify the characteristics and potential of the novel task-based system design.
A core challenge of HPC visualization is the physical separation of visualization resources and end-users.
Using more tiles than previously thought reasonable, a distributed, low-latency multi-tile streaming system is demonstrated, being able to sustain a stable 80 Hz when streaming up to 256 synchronized 3840x2160 tiles and achieve 365 Hz at 3840x2160 for sort-first compositing over the internet, thereby enabling lightweight visualization clients and leaving all the heavy lifting to the remote supercomputer.
Private data analytics systems preferably provide required analytic accuracy to analysts and specified privacy to individuals whose data is analyzed. Devising a general system that works for a broad range of datasets and analytic scenarios has proven to be difficult.
Despite the advent of differentially private systems with proven formal privacy guarantees, industry still uses inferior ad-hoc mechanisms that provide better analytic accuracy. Differentially private mechanisms often need to add large amounts of noise to statistical results, which impairs their usability.
In my thesis I follow two approaches to improve the usability of private data analytics systems in general and differentially private systems in particular. First, I revisit ad-hoc mechanisms and explore the possibilities of systems that do not provide Differential Privacy or only a weak version thereof. Based on an attack analysis I devise a set of new protection mechanisms including Query Based Bookkeeping (QBB). In contrast to previous systems QBB only requires the history of analysts’ queries in order to provide privacy protection. In particular, QBB does not require knowledge about the protected individuals’ data.
In my second approach I use the insights gained with QBB to propose UniTraX, the first differentially private analytics system that allows to analyze part of a protected dataset without affecting the other parts and without giving up on accuracy. I show UniTraX’s usability by way of multiple case studies on real-world datasets across different domains. UniTraX allows more queries than previous differentially private data analytics systems at moderate runtime overheads.
The simulation of physical phenomena involving the dynamic behavior of fluids and gases
has numerous applications in various fields of science and engineering. Of particular interest
is the material transport behavior, the tendency of a flow field to displace parts of the
medium. Therefore, many visualization techniques rely on particle trajectories.
Lagrangian Flow Field Representation. In typical Eulerian settings, trajectories are
computed from the simulation output using numerical integration schemes. Accuracy concerns
arise because, due to limitations of storage space and bandwidth, often only a fraction
of the computed simulation time steps are available. Prior work has shown empirically that
a Lagrangian, trajectory-based representation can improve accuracy [Agr+14]. Determining
the parameters of such a representation in advance is difficult; a relationship between the
temporal and spatial resolution and the accuracy of resulting trajectories needs to be established.
We provide an error measure for upper bounds of the error of individual trajectories.
We show how areas at risk for high errors can be identified, thereby making it possible to
prioritize areas in time and space to allocate scarce storage resources.
Comparative Visual Analysis of Flow Field Ensembles. Independent of the representation,
errors of the simulation itself are often caused by inaccurate initial conditions,
limitations of the chosen simulation model, and numerical errors. To gain a better understanding
of the possible outcomes, multiple simulation runs can be calculated, resulting in
sets of simulation output referred to as ensembles. Of particular interest when studying the
material transport behavior of ensembles is the identification of areas where the simulation
runs agree or disagree. We introduce and evaluate an interactive method that enables application
scientists to reliably identify and examine regions of agreement and disagreement,
while taking into account the local transport behavior within individual simulation runs.
Particle-Based Representation and Visualization of Uncertain Flow Data Sets. Unlike
simulation ensembles, where uncertainty of the solution appears in the form of different
simulation runs, moment-based Eulerian multi-phase fluid simulations are probabilistic in
nature. These simulations, used in process engineering to simulate the behavior of bubbles in
liquid media, are aimed toward reducing the need for real-world experiments. The locations
of individual bubbles are not modeled explicitly, but stochastically through the properties of
locally defined bubble populations. Comparisons between simulation results and physical
experiments are difficult. We describe and analyze an approach that generates representative
sets of bubbles for moment-based simulation data. Using our approach, application scientists
can directly, visually compare simulation results and physical experiments.
3D joint kinematics can provide important information about the quality of movements. Optical motion capture systems (OMC) are considered the gold standard in motion analysis. However, in recent years, inertial measurement units (IMU) have become a promising alternative. The aim of this study was to validate IMU-based 3D joint kinematics of the lower extremities during different movements. Twenty-eight healthy subjects participated in this study. They performed bilateral squats (SQ), single-leg squats (SLS) and countermovement jumps (CMJ). The IMU kinematics was calculated using a recently-described sensor-fusion algorithm. A marker based OMC system served as a reference. Only the technical error based on algorithm performance was considered, incorporating OMC data for the calibration, initialization, and a biomechanical model. To evaluate the validity of IMU-based 3D joint kinematics, root mean squared error (RMSE), range of motion error (ROME), Bland-Altman (BA) analysis as well as the coefficient of multiple correlation (CMC) were calculated. The evaluation was twofold. First, the IMU data was compared to OMC data based on marker clusters; and, second based on skin markers attached to anatomical landmarks. The first evaluation revealed means for RMSE and ROME for all joints and tasks below 3°. The more dynamic task, CMJ, revealed error measures approximately 1° higher than the remaining tasks. Mean CMC values ranged from 0.77 to 1 over all joint angles and all tasks. The second evaluation showed an increase in the RMSE of 2.28°– 2.58° on average for all joints and tasks. Hip flexion revealed the highest average RMSE in all tasks (4.87°– 8.27°). The present study revealed a valid IMU-based approach for the measurement of 3D joint kinematics in functional movements of varying demands. The high validity of the results encourages further development and the extension of the present approach into clinical settings.
Graphs and flow networks are important mathematical concepts that enable the modeling and analysis of a large variety of real world problems in different domains such as engineering, medicine or computer science. The number, sizes and complexities of those problems permanently increased during the last decades. This led to an increased demand of techniques that help domain experts in understanding their data and its underlying structure to enable an efficient analysis and decision making process.
To tackle this challenge, this work presents several new techniques that utilize concepts of visual analysis to provide domain scientists with new visualization methodologies and tools. Therefore, this work provides novel concepts and approaches for diverse aspects of the visual analysis such as data transformation, visual mapping, parameter refinement and analysis, model building and visualization as well as user interaction.
The presented techniques form a framework that enriches domain scientists with new visual analysis tools and help them analyze their data and gain insight from the underlying structures. To show the applicability and effectiveness of the presented approaches, this work tackles different applications such as networking, product flow management and vascular systems, while preserving the generality to be applicable to further domains.
The focus of this work is to provide and evaluate a novel method for multifield topology-based analysis and visualization. Through this concept, called Pareto sets, one is capable to identify critical regions in a multifield with arbitrary many individual fields. It uses ideas found in graph optimization to find common behavior and areas of divergence between multiple optimization objectives. The connections between the latter areas can be reduced into a graph structure allowing for an abstract visualization of the multifield to support data exploration and understanding.
The research question that is answered in this dissertation is about the general capability and expandability of the Pareto set concept in context of visualization and application. Furthermore, the study of its relations, drawbacks and advantages towards other topological-based approaches. This questions is answered in several steps, including consideration and comparison with related work, a thorough introduction of the Pareto set itself as well as a framework for efficient implementation and an attached discussion regarding limitations of the concept and their implications for run time, suitable data, and possible improvements.
Furthermore, this work considers possible simplification approaches like integrated single-field simplification methods but also using common structures identified through the Pareto set concept to smooth all individual fields at once. These considerations are especially important for real-world scenarios to visualize highly complex data by removing small local structures without destroying information about larger, global trends.
To further emphasize possible improvements and expandability of the Pareto set concept, the thesis studies a variety of different real world applications. For each scenario, this work shows how the definition and visualization of the Pareto set is used and improved for data exploration and analysis based on the scenarios.
In summary, this dissertation provides a complete and sound summary of the Pareto set concept as ground work for future application of multifield data analysis. The possible scenarios include those presented in the application section, but are found in a wide range of research and industrial areas relying on uncertainty analysis, time-varying data, and ensembles of data sets in general.
Wearable activity recognition aims to identify and assess human activities with the help
of computer systems by evaluating signals of sensors which can be attached to the human
body. This provides us with valuable information in several areas: in health care, e.g. fluid
and food intake monitoring; in sports, e.g. training support and monitoring; in entertainment,
e.g. human-computer interface using body movements; in industrial scenarios, e.g.
computer support for detected work tasks. Several challenges exist for wearable activity
recognition: a large number of nonrelevant activities (null class), the evaluation of large
numbers of sensor signals (curse of dimensionality), ambiguity of sensor signals compared
to the activities and finally the high variability of human activity in general.
This thesis develops a new activity recognition strategy, called invariants classification,
which addresses these challenges, especially the variability in human activities. The
core idea is that often even highly variable actions include short, more or less invariant
sub-actions which are due to hard physical constraints. If someone opens a door, the
movement of the hand to the door handle is not fixed. However the door handle has to
be pushed to open the door. The invariants classification algorithm is structured in four
phases: segmentation, invariant identification, classification, and spotting. The segmentation
divides the continuous sensor data stream into meaningful parts, which are related
to sub-activities. Our segmentation strategy uses the zero crossings of the central difference
quotient of the sensor signals, as segment borders. The invariant identification finds
the invariant sub-activities by means of clustering and a selection strategy dependent on
certain features. The classification identifies the segments of a specific activity class, using
models generated from the invariant sub-activities. The models include the invariant
sub-activity signal and features calculated on sensor signals related to the sub-activity. In
the spotting, the classified segments are used to find the entire activity class instances in
the continuous sensor data stream. For this purpose, we use the position of the invariant
sub-activity in the related activity class instance for the estimation of the borders of the
activity instances.
In this thesis, we show that our new activity recognition strategy, built on invariant
sub-activities, is beneficial. We tested it on three human activity datasets with wearable
inertial measurement units (IMU). Compared to previous publications on the same
datasets we got improvement in the activity recognition in several classes, some with a
large margin. Our segmentation achieves a sensible method to separate the sensor data in
relation to the underlying activities. Relying on sub-activities makes us independent from
imprecise labels on the training data. After the identification of invariant sub-activities,
we calculate a value called cluster precision for each sensor signal and each class activity.
This tells us which classes can be easily classified and which sensor channels support
the classification best. Finally, in the training for each activity class, our algorithm selects
suitable signal channels with invariant sub-activities on different points in time and
with different length. This makes our strategy a multi-dimensional asynchronous motif
detection with variable motif length.
In this paper we present the results of the project “#Datenspende” where during the German election in 2017 more than 4000 people contributed their search results regarding keywords connected to the German election campaign.
Analyzing the donated result lists we prove, that the room for personalization of the search results is very small. Thus the opportunity for the effect mentioned in Eli Pariser’s filter bubble theory to occur in this data is also very small, to a degree that it is negligible. We achieved these results by applying various similarity measures to the result lists that were donated. The first approach using the number of common results as a similarity measure showed that the space for personalization is less than two results out of ten on average when searching for persons and at most four regarding the search for parties. Application of other, more specific measures show that the space is indeed smaller, so that the presence of filter bubbles is not evident.
Moreover this project is also a proof of concept, as it enables society to permanently monitor a search engine’s degree of personalization for any desired search terms. The general design can also be transferred to intermediaries, if appropriate APIs restrict selective access to contents relevant to the study in order to establish a similar degree of trustworthiness.