Kaiserslautern - Fachbereich Informatik
Refine
Year of publication
- 2017 (8) (remove)
Document Type
- Doctoral Thesis (8)
Has Fulltext
- yes (8)
Keywords
- Data Modeling (1)
- Datenbanken (1)
- Incremental recomputation (1)
- LIDAR (1)
- Lokalisierung (1)
- Magnetfeldbasierter Lokalisierung (1)
- Magnetfelder (1)
- MapReduce (1)
- NoSQL (1)
- Participatory Sensing (1)
Faculty / Organisational entity
Computational simulations run on large supercomputers balance their outputs with the need of the scientist and the capability of the machine. Persistent storage is typically expensive and slow, its peformance grows at a slower rate than the processing power of the machine. This forces scientists to be practical about the size and frequency of the simulation outputs that can be later analyzed to understand the simulation states. Flexibility in the trade-offs of flexibilty and accessibility of the outputs of the simulations are critical the success of scientists using the supercomputers to understand their science. In situ transformations of the simulation state to be persistently stored is the focus of this dissertation.
The extreme size and parallelism of simulations can cause challenges for visualization and data analysis. This is coupled with the need to accept pre partitioned data into the analysis algorithms, which is not always well oriented toward existing software infrastructures. The work in this dissertation is focused on improving current work flows and software to accept data as it is, and efficiently produce smaller, more information rich data, for persistent storage that is easily consumed by end-user scientists. I attack this problem from both a theoretical and practical basis, by managing completely raw data to quantities of information dense visualizations and study methods for managing both the creation and persistence of data products from large scale simulations.
The proliferation of sensors in everyday devices – especially in smartphones – has led to crowd sensing becoming an important technique in many urban applications ranging from noise pollution mapping or road condition monitoring to tracking the spreading of diseases. However, in order to establish integrated crowd sensing environments on a large scale, some open issues need to be tackled first. On a high level, this thesis concentrates on dealing with two of those key issues: (1) efficiently collecting and processing large amounts of sensor data from smartphones in a scalable manner and (2) extracting abstract data models from those collected data sets thereby enabling the development of complex smart city services based on the extracted knowledge.
Going more into detail, the first main contribution of this thesis is the development of methods and architectures to facilitate simple and efficient deployments, scalability and adaptability of crowd sensing applications in a broad range of scenarios while at the same time enabling the integration of incentivation mechanisms for the participating general public. During an evaluation within a complex, large-scale environment it is shown that real-world deployments of the proposed data recording architecture are in fact feasible. The second major contribution of this thesis is the development of a novel methodology for using the recorded data to extract abstract data models which are representing the inherent core characteristics of the source data correctly. Finally – and in order to bring together the results of the thesis – it is demonstrated how the proposed architecture and the modeling method can be used to implement a complex smart city service by employing a data driven development approach.
Temporal Data Management and Incremental Data Recomputation with Wide-column Stores and MapReduce
(2017)
In recent years, ”Big Data” has become an important topic in academia
and industry. To handle the challenges and problems caused by Big Data,
new types of data storage systems called ”NoSQL stores” (means ”Not-only-
SQL”) have emerged.
”Wide-column stores” are one kind of NoSQL stores. Compared to relational database systems, wide-column stores introduce a new data model,
new IRUD (Insert, Retrieve, Update and Delete) semantics with support for
schema-flexibility, single-row transactions and data expiration constraints.
Moreover, each column stores multiple data versions with associated time-
stamps. Well-known examples are Google’s ”Big-table” and its open sourced
counterpart ”HBase”. Recently, such systems are increasingly used in business intelligence and data warehouse environments to provide decision support, controlling and revision capabilities.
Besides managing the current values, data warehouses also require management and processing of historical, time-related data. Data warehouses
frequently employ techniques for processing changes in various data sources
and incrementally applying such changes to the warehouse to keep it up-to-
date. Although both incremental data warehousing maintenance and temporal data management have been the subject of intensive research in the
relational database and finally commercial database products have picked up
the ability for temporal data processing and management, such capabilities
have not been explored systematically for today’s wide-column stores.
This thesis helps to address the shortcomings mentioned above. It care-
fully analyzes the properties of wide-column stores and the applicability
of mechanisms for temporal data management and incremental data ware-
house maintenance known from relational databases, extends well-known approaches and develops new capabilities for providing equivalent support in
wide-column stores.
NoSQL-Datenbanken werden als Alternative zu klassischen relationalen Datenbanksystemen eingesetzt, um die Herausforderungen zu meistern, die „Big Data“ mit sich bringt. Big Data wird über die drei V definiert: Es sind große Datenmengen („Volume“), die schnell anwachsen („Velocity“) und heterogene Strukturen haben („Variety“). NoSQL-Datenbanken besitzen zudem meist nur sehr einfache Anfragemethoden. Um auch komplexe Datenanalysen durchzuführen, kommen meist Datenverarbeitungsframeworks wie MapReduce, Spark oder Flink zum Einsatz. Diese sind jedoch schwieriger in der Benutzung als SQL oder andere Anfragesprachen.
In dieser Arbeit wird die Datentransformationssprache NotaQL vorgestellt. Die Sprache verfolgt drei Ziele. Erstens ist sie mächtig, einfach zu erlernen und ermöglicht komplexe Transformationen in wenigen Code-Zeilen. Zweitens ist die Sprache unabhängig von einem speziellen Datenbankmanagementsystem oder einem Datenmodell. Daten können von einem System in ein anderes transformiert und Datenmodelle dementsprechend ineinander überführt werden. Drittens ist es möglich, NotaQL-Skripte auf verschiedene Arten auszuführen, sei es mittels eines Datenverarbeitsungsframeworks oder über die Abbildung in eine andere Sprache. Typische Datentransformationen werden periodisch ausgeführt, um bei sich ändernden Basisdaten die Ergebnisse aktuell zu halten. Für solche Transformationen werden in dieser Arbeit verschiedene inkrementellen Ansätze miteinander verglichen, die es möglich machen, dass NotaQL-Transformationen die vorherigen Ergebnisse wiederbenutzen und Änderungen seit der letzten Berechnung darauf anwenden können. Die NotaQL-Plattform unterstützt verschiedene inkrementelle und nicht-inkrementelle Ausführungsarten und beinhaltet eine intelligente Advisor-Komponente, um Transformationen stets auf die bestmögliche Art auszuführen. Die vorgestellte Sprache ist optimiert für die gebräuchlichen NoSQL-Datenbanken, also Key-Value-Stores, Wide-Column-Stores, Dokumenten- und Graph-Datenbanken. Das mächtige und erweiterbare Datenmodell der Sprache erlaubt die Nutzung von Arrays, verschachtelten Objekten und Beziehungen zwischen Objekten. Darüber hinaus kann NotaQL aber nicht nur auf NoSQL-Datenbanken, sondern auch auf relationalen Datenbanken, Dateiformaten, Diensten und Datenströmen eingesetzt werden. Stößt ein Benutzer an das Limit, sind Kopplungen zu Programmiersprachen und existierenden Anwendungen mittels der Entwicklung benutzerdefinierter Funktionen und Engines möglich. Die Anwendungsmöglichkeiten von NotaQL sind Datentransformationen jeglicher Art, von Big-Data-Analysen und Polyglot-Persistence-Anwendungen bis hin zu Datenmigrationen und -integrationen.
The development of autonomous mobile robots is a major topic of current research. As those robots must be able to react to changing environments and avoid collisions also with moving obstacles, the fulfilment of safety requirements is an important aspect. Behaviour-based systems (BBS) have proven to meet several of the properties required for these kindsof robots, such as reactivity, extensibility and re-usability of individual components. BBS consist of a number of behavioural components that individually realise simple tasks. Their interconnection allows to achieve complex robot behaviour, which implies that correct
connections are crucial. The resulting networks can get very large making them difficult to verify. This dissertation presents a novel concept for the analysis and verification of complex autonomous robot systems controlled by behaviour-based software architectures with special focus on the integration of environmental aspects into the processes.
Several analysis techniques have been investigated and adapted to the special requirements of BBS. These include a structural analysis, which is used to find constraint violations and faults in the network layout. Fault tree analysis is applied to identify root causes of hazards and the relationship of system events. For this, a technique to map the behaviour-based control network to the structure of a fault tree has been developed. Testing and data analysis are used for the detection of failures and their root causes. Here, a new concept that identifies patterns in data recorded during test runs has been introduced.
All of these methods cannot guarantee failure-free and safe robot behaviour and can never prove the absence of failures. Therefore, model checking as formal verification technique that proves a property to be correct for the given system, has been chosen to complement the set of analysis techniques. A novel concept for the integration of environmental influences into the model checking process is proposed. Environmental situations and the sensor processing chain are represented as synchronised automata similar to the modelling of the behavioural network. Tools supporting the whole verification process including the creation of formal queries in its environment have been developed.
During the verification of large behavioural networks, the scalability of the model checking approach appears as a big problem. Several approaches that deal with this problem have been investigated and the selection of slicing and abstraction methods has been justified. A concept for the application of these methods is provided, that reduces the behavioural network to the relevant parts before the actual verification process.
All techniques have been applied to the behaviour-based control system of the autonomous outdoor robot RAVON. Its complex network with more than 400 components allows for demonstrating the soundness of the presented concepts. The set of different techniques provides a fundamental basis for a comprehensive analysis and verification of BBS acting in changing environments.
This thesis is concerned with different null-models that are used in network analysis. Whenever it is of interest whether a real-world graph is exceptional regarding a particular measure, graphs from a null-model can be used to compare the real-world graph to. By analyzing an appropriate null-model, a researcher may find whether the results of the measure on the real-world graph is exceptional or not.
Deciding which null-model to use is hard and sometimes the difference between the null-models is not even considered. In this thesis, there are several results presented: First, based on simple global measures, undirected graphs are analyzed. The results for these measures indicates that it is not important which null-model is used, thus, the fastest algorithm of a null-model may be used. Next, local measures are investigated. The fastest algorithm proves to be the most complicated to analyze. The model includes multigraphs which do not meet the conditions of all the measures, thus, the measures themselves have to be altered to take care of multigraphs as well. After careful consideration, the conditions are met and the analysis shows, that the fastest is not always the best.
The same applies for directed graphs, as is shown in the last part. There, another more complex measure on graphs is introduced. I continue testing the applicability of several null-models; in the end, a set of equations proves to be fast and good enough as long as conditions regarding the degree sequence are met.
This dissertation describes an indoor localization system based on oscillating magnetic fields and the underlying processing architecture. The system consists of several fixed anchor points, generating the magnetic fields (transmitter), and wearable magnetic field measurement units, whose position should be determined (receiver). The system is evaluated in different environments and application areas. Additionally, various fields of application are discussed and assessed in ubiquitous and pervasive computing and Ambient Assisted Living. The fusion of magnetic field-based distance information and positions derived from LIDAR distance measurements is described and evaluated.
The system architecture consists of three layers, a physical layer, a layer for position and distance estimation between a magnetic field transmitter and a receiver, and a layer which uses several measurements to different transmitters to estimate the overall position of a wearable measurement unit.
Each layer covers different aspects which have to be taken care of when magnetic field information is processed. Especially the properties of the generated magnetic field information are considered in the processing algorithms.
The physical layer covers the magnetic field generation and magnetic Field-Based information transfer, synchronization of a transmitter and the receivers and the description of the locally measured magnetic fields on the receiver side. After a transfer of this information to a central processing unit, the hardware specific signal levels are transformed to the levels of the theoretical magnetic field models. The values are then used to estimate candidate positions and distances. Due to symmetrical effects of the magnetic fields, it is only possible to reduce the receiver position to 8 points around the transmitter (one position in each of the octants of the coordinate system). The determined positions have a mean error of 108 cm, the average error of the distance is 40 cm.
On top of this, the distance and position information against different transmitters are fused, this covers clock synchronization of transmitters, triggering and scheduling sequences and distance and position based localization and tracking algorithms. The magnetic-field-based indoor localization system has been evaluated in different applications and environments; the mean position error is 60 cm to 70 cm depending on the environment. A comparison against an RF-based indoor localization system shows the robustness of magnetic fields against RF shadows caused by big metal objects.
We additionally present algorithms for regions of interest detection, working on raw magnetic field information and transformed position and distance information. Setups in larger areas can distinguish regions which are further than 50 cm apart, small scale coil setups (3 transmitters in 2m^3) allow to resolve regions below 20 cm.
In the end, we describe a fusion algorithm for a wearable localization system based on 4 LIDAR distance measurement units and magnetic field-based distance estimation. The magnetic field indoor localization system provides distance proximity information which is used to resolve ambiguous position estimates of the LIDAR system. In a room (8m × 10m), we achieve a mean error of 8 cm.