Hadoop ist ein beliebtes Framework für verteilte Berechnungen über große
Datenmengen (Big Data) mittels MapReduce. Hadoop zu verwenden ist einfach: Der
Entwickler definiert die Eingabedatenquelle und implementiert die beiden
Methoden Map und Reduce. Über die verteilte Berechnung und Fehlerbehandlung muss
er sich dabei keine Gedanken machen, das erledigt das Hadoop-Framework.
Allerdings kann die Analyse von Big Data sehr lange dauern und da sich die
Eingabedaten jede Sekunde ändern, ist es vielleicht nicht immer die beste
Idee, die vollständige Berechnung jedes Mal aufs Neue auf die kompletten
Eingabedaten anzuwenden. Es wäre geschickter, sich das Ergebnis der
vorherigen Berechnung zu betrachten und nur die Deltas zu analysieren, also
Daten, die seit der letzten Berechnung hinzugefügt oder gelöscht wurden. In dem Gebiet der
selbstwartbaren materialisierten Sichten in relationalen Datenbanksystemen gibt
es bereits mehrere Ansätze, die sich mit der Lösung dieses Problems
beschäftigen. Eine Strategie liest nur die Deltas und inkrementiert oder
dekrementiert die Ergebnisse der vorherigen Berechnung. Allerdings ist diese
Inkrement-Operation sehr teuer, deshalb ist es manchmal besser, das komplette
alte Ergebnis zu lesen und es mit den Deltas der Eingabedaten zu kombinieren.
In dieser Masterarbeit wird ein neues Framework namens Marimba vorgestellt,
welches sich genau um diese Probleme der inkrementellen Berechnung kümmert. Einen
Map\-Re\-duce-Job in Marimba zu schreiben ist genau so einfach wie einen Hadoop-Job.
Allerdings werden hier keine Mapper- und Reducer-Klasse implementiert, sondern
eine Translator- und Serializer-Klasse. Der Translator ähnelt dem Mapper: Er
bestimmt, wie die Eingabedaten gelesen und daraus Zwischenwerte berechnet
werden. Der Serializer erzeugt die Ausgabe des Jobs. Wie diese Ausgabe berechnet
wird, gibt der Benutzer durch Implementierung einiger Methoden an, um Werte zu
aggregieren und invertieren.
Vier MapReduce-Jobs, darunter auch das Paradebeispiel für MapReduce WordCount,
wurden im Marimba-Framework implementiert. Das Entwickeln von inkrementellen
Map-Reduce-Jobs ist mit dem Framework extrem einfach geworden. Außerdem konnte
mit Performanztests gezeigt werden, dass die inkrementelle Berechnung deutlich
schneller ist als eine vollständige Neuberechnung.
Ein weiterer unter den vier implementierten Jobs berechnet
Wortauftrittswahrscheinlichkeiten in geschriebenen Sätzen. Dies kann
beispielsweise für Spracherkennung verwendet werden. Wenn ein Wort in einer
gesprochenen SMS nicht richtig verstanden wurde, hilft der Algorithmus zu raten,
welches Wort am wahrscheinlichsten an einer bestimmten Stelle stehen könnte,
abhängig von den vorherigen Wörtern im Satz. Damit dieser Algorithmus auch
brauchbare Ergebnisse liefert, ist die Menge und die Qualität der Eingabedaten
wichtig. Durchaus brauchbare Ergebnisse wurden durch die Verarbeitung von
Millionen von Twitter-Feeds, die deutsche Twitter-Nutzer in den letzten Monaten
geschrieben haben, erreicht.
For many decades, the search for language classes that extend the
context-free laguages enough to include various languages that arise in
practice, while still keeping as many of the useful properties that
context-free grammars have - most notably cubic parsing time - has been
one of the major areas of research in formal language theory. In this thesis
we add a new family of classes to this field, namely
position-and-length-dependent context-free grammars. Our classes use the
approach of regulated rewriting, where derivations in a context-free base
grammar are allowed or forbidden based on, e.g., the sequence of rules used
in a derivation or the sentential forms, each rule is applied to. For our
new classes we look at the yield of each rule application, i.e. the
subword of the final word that eventually is derived from the symbols
introduced by the rule application. The position and length of the yield
in the final word define the position and length of the rule application and
each rule is associated a set of positions and lengths where it is allowed
to be applied.
We show that - unless the sets of allowed positions and lengths are really
complex - the languages in our classes can be parsed in the same time as
context-free grammars, using slight adaptations of well-known parsing
algorithms. We also show that they form a proper hierarchy above the
context-free languages and examine their relation to language classes
defined by other types of regulated rewriting.
We complete the treatment of the language classes by introducing pushdown
automata with position counter, an extension of traditional pushdown
automata that recognizes the languages generated by
position-and-length-dependent context-free grammars, and we examine various
closure and decidability properties of our classes. Additionally, we gather
the corresponding results for the subclasses that use right-linear resp.
left-linear base grammars and the corresponding class of automata, finite
automata with position counter.
Finally, as an application of our idea, we introduce length-dependent
stochastic context-free grammars and show how they can be employed to
improve the quality of predictions for RNA secondary structures.
The recognition of day-to-day activities is still a very challenging and important research topic. During recent years, a lot of research has gone into designing and realizing smart environ- ments in different application areas such as health care, maintenance, sports or smart homes. As a result, a large amount of sensor modalities were developed, different types of activity and context recognition services were implemented and the resulting systems were benchmarked using state-of-the-art evaluation techniques. However, so far hardly any of these approaches have found their way into the market and consequently into the homes of real end-users on a large scale. The reason for this is, that almost all systems have one or more of the following characteristics in common: expensive high-end or prototype sensors are used which are not af- fordable or reliable enough for mainstream applications; many systems are deployed in highly instrumented environments or so-called "living labs", which are far from real-life scenarios and are often evaluated only in research labs; almost all systems are based on complex system con- figurations and/or extensive training data sets, which means that a large amount of data must be collected in order to install the system. Furthermore, many systems rely on a user and/or environment dependent training, which makes it even more difficult to install them on a large scale. Besides, a standardized integration procedure for the deployment of services in existing environments and smart homes has still not been defined. As a matter of fact, service providers use their own closed systems, which are not compatible with other systems, services or sensors. It is clear, that these points make it nearly impossible to deploy activity recognition systems in a real daily-life environment, to make them affordable for real users and to deploy them in hundreds or thousands of different homes.
This thesis works towards the solution of the above mentioned problems. Activity and context recognition systems designed for large-scale deployment and real-life scenarios are intro- duced. Systems are based on low-cost, reliable sensors and can be set up, configured and trained with little effort, even by technical laymen. It is because of these characteristics that we call our approach "minimally invasive". As a consequence, large amounts of training data, that are usu- ally required by many state-of-the-art approaches, are not necessary. Furthermore, all systems were integrated unobtrusively in real-world/similar to real-world environments and were evalu- ated under real-life, as well as similar to real-life conditions. The thesis addresses the following topics: First, a sub-room level indoor positioning system is introduced. The system is based on low-cost ceiling cameras and a simple computer vision tracking approach. The problem of user identification is solved by correlating modes of locomotion patterns derived from the trajectory of unidentified objects and on-body motion sensors. Afterwards, the issue of recognizing how and what mainstream household devices have been used for is considered. Based on a low-cost microphone, the water consumption of water-taps can be approximated by analyzing plumbing noise. Besides that, operating modes of mainstream electronic devices were recognized by using rule-based classifiers, electric current features and power measurement sensors. As a next step, the difficulty of spotting subtle, barely distinguishable hand activities and the resulting object interactions, within a data set containing a large amount of background data, is addressed. The problem is solved by introducing an on-body core system which is configured by simple, one-time physical measurements and minimal data collections. The lack of large training sets is compensated by fusing the system with activity and context recognition systems, that are able to reduce the search space observed. Amongst other systems, previously introduced approaches and ideas are revisited in this section. An in-depth evaluation shows the impact of each fusion procedure on the performance and run-time of the system. The approaches introduced are able to provide significantly better results than a state-of-the-art inertial system using large amounts of training data. The idea of using unobtrusive sensors has also been applied to the field of behavior analysis. Integrated smartphone sensors are used to detect behavioral changes of in- dividuals due to medium-term stress periods. Behavioral parameters related to location traces, social interactions and phone usage were analyzed to detect significant behavioral changes of individuals during stressless and stressful time periods. Finally, as a closing part of the the- sis, a standardization approach related to the integration of ambient intelligence systems (as introduced in this thesis) in real-life and large-scale scenarios is shown.
An huge amount of computational models and programming languages have been proposed
for the description of embedded systems. In contrast to traditional sequential programming
languages, they cope directly with the requirements for embedded systems: direct support for
concurrent computations and periodic interaction with the environment are only some of the
features they offer. Synchronous languages are one class of languages for the development of
embedded systems and they follow the fundamental principle that the execution is divided into
a sequence of logical steps. Thereby, each step follows the simplification that the computation
of the outputs is finished directly when the inputs are available. This rigorous abstraction leads
to well-defined deterministic parallel composition in general, and to deterministic abortion
and suspension in imperative synchronous languages in particular. These key features also
allow to translate programs to hardware and software, and also formal verification techniques
like model checking can be easily applied.
Besides the advantages of imperative synchronous languages, also some drawbacks can
be listed. Over-synchronization is an effect being caused by parallel threads which have to
synchronize for each execution step, even if they do not communicate, since the synchronization
is implicitly forced by the control-flow. This thesis considers the idea of clock refinement to
introduce several abstraction layers for communication and synchronization in addition to the
existing single-clock abstraction. Thereby, clocks can be refined by several independent clocks
so that a controlled amount of asynchrony between subsequent synchronization points can be
exploited by compilers. The declarations of clocks form a tree, and clocks can be defined within
the threads of the parallel statement, which allows one to do independent computations based
on these clocks without synchronizing the threads. However, the synchronous abstraction is
kept at each level of the abstraction.
Clock refinement is introduced in this thesis as an extension to the imperative synchronous
language Quartz. Therefore, new program statements are introduced which allow to define
a new clock as a refinement of an existing one and to finish a step based on a certain clock.
Examples are considered to show the impact of the behavior of the new statements to
the already existing statements, before the semantics of this extension is formally defined.
Furthermore, the thesis presents a compile algorithm to translate programs to an intermediate
format, and to translate the intermediate format to a hardware description. The advantages
obtained by the new modeling feature are finally evaluated based on examples.
Regular physical activity is essential to maintain or even improve an individual’s health. There exist various guidelines on how much individuals should do. Therefore, it is important to monitor performed physical activities during people’s daily routine in order to tell how far they meet professional recommendations. This thesis follows the goal to develop a mobile, personalized physical activity monitoring system applicable for everyday life scenarios. From the mentioned recommendations, this thesis concentrates on monitoring aerobic physical activity. Two main objectives are defined in this context. On the one hand, the goal is to estimate the intensity of performed activities: To distinguish activities of light, moderate or vigorous effort. On the other hand, to give a more detailed description of an individual’s daily routine, the goal is to recognize basic aerobic activities (such as walk, run or cycle) and basic postures (lie, sit and stand).
With recent progress in wearable sensing and computing the technological tools largely exist nowadays to create the envisioned physical activity monitoring system. Therefore, the focus of this thesis is on the development of new approaches for physical activity recognition and intensity estimation, which extend the applicability of such systems. In order to make physical activity monitoring feasible in everyday life scenarios, the thesis deals with questions such as 1) how to handle a wide range of e.g.
everyday, household or sport activities and 2) how to handle various potential users. Moreover, this thesis deals with the realistic scenario where either the currently performed activity or the current user is unknown during the development and training
phase of activity monitoring applications. To answer these questions, this thesis proposes and developes novel algorithms, models and evaluation techniques, and performs thorough experiments to prove their validity.
The contributions of this thesis are both of theoretical and of practical value. Addressing the challenge of creating robust activity monitoring systems for everyday life the concept of other activities is introduced, various models are proposed and validated. Another key challenge is that complex activity recognition tasks exceed the potential of existing classification algorithms. Therefore, this thesis introduces a confidence-based extension of the well known AdaBoost.M1 algorithm, called ConfAdaBoost.M1. Thorough experiments show its significant performance improvement compared to commonly used boosting methods. A further major theoretical contribution is the introduction and validation of a new general concept for the personalization of physical activity recognition applications, and the development of a novel algorithm (called Dependent Experts) based on this concept. A major contribution of practical value is the introduction of a new evaluation technique (called leave-one-activity-out) to simulate when performing previously unknown activities in a physical activity monitoring system. Furthermore, the creation and benchmarking of publicly available physical activity monitoring datasets within this thesis are directly benefiting the research community. Finally, the thesis deals with issues related to the implementation of the proposed methods, in order to realize the envisioned mobile system and integrate it into a full healthcare application for aerobic activity monitoring and support in daily life.
Backward compatibility of class libraries ensures that an old implementation of a library can safely be replaced by a new implementation without breaking existing clients.
Formal reasoning about backward compatibility requires an adequate semantic model to compare the behavior of two library implementations.
In the object-oriented setting with inheritance and callbacks, finding such models is difficult as the interface between library implementations and clients are complex.
Furthermore, handling these models in a way to support practical reasoning requires appropriate verification tools.
This thesis proposes a formal model for library implementations and a reasoning approach for backward compatibility that is implemented using an automatic verifier. The first part of the thesis develops a fully abstract trace-based semantics for class libraries of a core sequential object-oriented language. Traces abstract from the control flow (stack) and data representation (heap) of the library implementations. The construction of a most general context is given that abstracts exactly from all possible clients of the library implementation.
Soundness and completeness of the trace semantics as well as the most general context are proven using specialized simulation relations on the operational semantics. The simulation relations also provide a proof method for reasoning about backward compatibility.
The second part of the thesis presents the implementation of the simulation-based proof method for an automatic verifier to check backward compatibility of class libraries written in Java. The approach works for complex library implementations, with recursion and loops, in the setting of unknown program contexts. The verification process relies on a coupling invariant that describes a relation between programs that use the old library implementation and programs that use the new library implementation. The thesis presents a specification language to formulate such coupling invariants. Finally, an application of the developed theory and tool to typical examples from the literature validates the reasoning and verification approach.
Compared to traditional software design, the design of embedded software is even more challenging: In addition to the correct implementation of the systems, one has to consider non-functional constraints such as real-time behavior, reliability, and energy consumption. Moreover, many embedded systems are used in safety-critical applications where errors can lead to enormous damages and even to the loss of human live. For this reason, formal verification is applied in many design flows using different kinds of formal verification methods.
The synchronous model of computation has shown to be well-suited in this context. Its core is the paradigm of perfect synchrony which assumes that the overall system behavior is divided into a sequence of reactions, and all computations within a reaction are completed in zero time. This temporal abstraction simplifies reactive programming in that developers do not have to bother about many low-level details related to timing, synchronization and scheduling. This thesis is dedicated to this design flow, and it presents the author's contributions to it.
Due to tremendous improvements of high-performance computing resources as well
as numerical advances computational simulations became a common tool for modern
engineers. Nowadays, simulation of complex physics is more and more substituting a
large amount of physical experiments. While the vast compute power of large-scale
high-performance systems enabled for simulating more complex numerical equations,
handling the ever increasing amount of data with spatial and temporal resolution
burdens new challenges to scientists. Huge hardware and energy costs desire for
e� cient utilization of high-performance systems. However, increasing complexity of
simulations raises the risk of failing simulations resulting in a single simulation to be
restarted multiple times. Computational Steering is a promising approach to interact
with running simulations which could prevent simulation crashes. The large amount
of data expands gaps in the amount of data that can be calculated and the amount of
data that can be processed. Extreme-scale simulations produce more data that can
even be stored. In this thesis, I propose several methods that enhance the process
of steering, exploring, visualizing, and analyzing ongoing numerical simulations.
There is a growing trend for ever larger wireless sensor networks (WSNs) consisting of thousands or tens of thousands of sensor nodes (e.g., [91, 79]). We believe this trend will continue and thus scalability plays a crucial role in all protocols and mechanisms for WSNs. Another trend in many modern WSN applications is the time sensitivity to information from sensors to sinks. In particular, WSNs are a central part of the vision of cyber-physical systems and as these are basically closed-loop systems many WSN applications will have to operate under stringent timing requirements. Hence, it is crucial to develop algorithms that minimize the worst-case delay in WSNs. In addition, almost all WSNs consist of battery-powered nodes, and thus energy-efficiency clearly remains another premier goal in order to keep network lifetime high. This dissertation presents and evaluates designs for WSNs using multiple sinks to achieve high lifetime and low delay. Firstly, we investigate random and deterministic node placement strategies for large-scale and time-sensitive WSNs. In particular, we focus on tiling-based deterministic node placement strategies and analyze their effects on coverage, lifetime, and delay performance under both exact placement and stochastically disturbed placement. Next, we present sink placement strategies, which constitutes the main contributions of this dissertation. Static sinks will be placed and mobile sinks will be given a trajectory. A proper sink placement strategy can improve the performance of a WSN significantly. In general, the optimal sink placement with lifetime maximization is an NP-hard problem. The problem is even harder if delay is taken into account. In order to achieve both lifetime and delay goals, we focus on the problem of placing multiple (static) sinks such that the maximum worst-case delay is minimized while keeping the energy consumption as low as possible. Different target networks may need a corresponding sink placement strategy under differing levels of apriori assumptions. Therefore, we first develop an algorithm based on the Genetic Algorithm (GA) paradigm for known sensor nodes' locations. For a network where global information is not feasible we introduce a self-organized sink placement (SOSP) strategy. While GA-based sink placement achieves a near-optimal solution, SOSP provides a good sink placement strategy with a lower communication overhead. How to plan the trajectories of many mobile sinks in very large WSNs in order to simultaneously achieve lifetime and delay goals had not been treated so far in the literature. Therefore, we delve into this difficult problem and propose a heuristic framework using multiple orbits for the sinks' trajectories. The framework is designed based on geometric arguments to achieve both, high lifetime and low delay. In simulations, we compare two different instances of our framework, one conceived based on a load-balancing argument and one based on a distance minimization argument, with a set of different competitors spanning from statically placed sinks to battery-state aware strategies. We find our heuristics outperform the competitors in both, lifetime and delay. Furthermore, and probably even more important, the heuristic, while keeping its good delay and lifetime performance, scales well with an increasing number of sinks. In brief, the goal of this dissertation is to show that placing nodes and sinks in conventional WSNs as well as planning trajectories in mobility enabled WSNs carefully really pays off for large-scale and time-sensitive WSNs.
Recent progresses and advances in the field of consumer electronics, driven by display
technologies and also the sector of mobile, hand-held devices, enable new ways in
presenting information to users, as well as new ways of user interaction, therefore
providing a basis for user-centered applications and work environments.
My thesis focuses on how arbitrary display environments can be utilized to improve
both the user experience, regarding perception of information, and also to provide
intuitive interaction possibilities. On the one hand advances in display technologies
provide the basis for new ways of visualizing content and collaborative work, on the
other hand forward-pressing developments in the consumer market, especially the
market of smart phones, offer potential to enhance usability in terms of interaction
and therefore can provide additional benefit for users.
Tiled display setups, combining both large screen real estate and high resolution,
provide new possibilities and chances to visualize large datasets and to facilitate col-
laboration in front of a large screen area. Furthermore these display setups present
several advantages over the traditional single-user-workspace environments: con-
trary to single-user-workspaces, multiple users are able to explore a dataset displayed
on a tiled display system, at the same time, thus allowing new forms of collabora-
tive work. Based on that, face-to-face discussions are enabled, an additional value
is added. Large displays also allow the utilization of the user’s spatial memory, al-
lowing physical navigation without the need of switching between different windows
to explore information.
With Tiled++ I contributed a versatile approach to address the bezel problem. The
bezel problem is one of the Top Ten research challenges in the research field of LCD-
based tiled wall setups. By applying the Tiled++ approach a large high resolution
Focus & Context screen is created, combining high resolution focus areas with low
resolution context information, projected onto the bezel area.
Additionally the field of user interaction poses an important challenge, especially
regarding the utilization of large tiled displays, since traditional keyboard & mouse
interaction devices reached their limits. My focus in this thesis is on Mobile HCI.Devices like mobile phones are utilized to interact with large displays, since they
feature various interaction modalities and preserve user mobility.
Large public displays, as a modernized form of traditional bulletin boards, also en-
able new ways of handling information, displaying content, and user interaction.
Utilized in hot spots, Digital Interactive Public Pinboards can provide an adequate
answer to questions like how to approach pressing issues like disaster and crisis man-
agement for both responders as well as citizens and also new ways of how to handle
information flow (contribution & distribution & accession). My contribution to the
research field of public display environments was the conception and implementa-
tion of an easy-to-use and easy-to-set-up architecture to overcome shortcomings of
current approaches and to cover the needs of aid personnel.
Although being a niche, Virtual Reality (VR) environments can provide additional
value for visualizing specific content. Disciplines like earth sciences & geology, me-
chanical engineering, design, and architecture can benefit from VR environments. In
order to consider the variety of users, I introduce a more intuitive and user friendly
interaction metaphor, the ARC metaphor.
Visualization challenges base on being able to cope with more and more complex
datasets and to bridge the gap between comprehensibility and loss of information.
Furthermore the visualization approach has to be reasonable, which is a crucial
factor when working in interdisciplinary teams, where the standard of knowledge
is diverse. Users have to be able to conceive the visualized content in a fast and
reliable way. My contribution are visualization approaches in the field of supportive
Finally, my work illuminates how the synthesis of visualization, interaction and dis-
play technologies enhance the user experience. I promote a holistic view. The user
is brought back into the focus of attention, provided with a tool-set to support him,
without overextending the abilities of, for example, non-expert users, a crucial factor
in the more and more interdisciplinary field of computer science.