Kaiserslautern - Fachbereich Informatik
Refine
Year of publication
Document Type
- Doctoral Thesis (218) (remove)
Language
- English (218) (remove)
Has Fulltext
- yes (218)
Keywords
- Visualisierung (16)
- Visualization (8)
- Deep Learning (7)
- Computergraphik (5)
- Evaluation (4)
- Artificial Intelligence (3)
- Geoinformationssystem (3)
- Machine Learning (3)
- Mustererkennung (3)
- Optische Zeichenerkennung (3)
Faculty / Organisational entity
Medical cyber-physical systems (MCPS) emerged as an evolution of the relations between connected health systems, healthcare providers, and modern medical devices. Such systems combine independent medical devices at runtime in order to render new patient monitoring/control functionalities, such as physiological closed loops for controlling drug infusion or optimization of alarms. Despite the advances regarding alarm precision, healthcare providers still struggle with alarm flooding caused by the limited risk assessment models. Furthermore, these limitations also impose severe barriers on the adoption of automated supervision through autonomous actions, such as safety interlocks for avoiding overdosage. The literature has focused on the verification of safety parameters to assure the safety of treatment at runtime and thus optimize alarms and automated actions. Such solutions have relied on the definition of actuation ranges based on thresholds for a few monitored parameters. Given the very dynamic nature of the relevant context conditions (e.g., the patient’s condition, treatment details, system configurations, etc.), fixed thresholds are a weak means for assessing the current risk. This thesis presents an approach for enabling dynamic risk assessment for cooperative MCPS based on an adaptive Bayesian Networks (BN) model. The main aim of the approach is to support continuous runtime risk assessment of the current situation based on relevant context and system information. The presented approach comprises (i) a dynamic risk analysis constituent, which corresponds to the elicitation of relevant risk parameters, risk metric building, and risk metric management; and (ii) a runtime risk classification constituent, which aims to analyze the current situation risk, establish risk classes, and identify and deploy mitigation measures. The proposed approach was evaluated and its feasibility proved by means of simulated experiments guided by an international team of medical experts with a focus on the requirements of efficacy, efficiency, and availability of patient treatment.
A prime motivation for using XML to directly represent pieces of information is the ability of supporting ad-hoc or 'schema-later' settings. In such scenarios, modeling data under loose data constraints is essential. Of course, the flexibility of XML comes at a price: the absence of a rigid, regular, and homogeneous structure makes many aspects of data management more challenging. Such malleable data formats can also lead to severe information quality problems, because the risk of storing inconsistent and incorrect data is greatly increased. A prominent example of such problems is the appearance of the so-called fuzzy duplicates, i.e., multiple and non-identical representations of a real-world entity. Similarity joins correlating XML document fragments that are similar can be used as core operators to support the identification of fuzzy duplicates. However, similarity assessment is especially difficult on XML datasets because structure, besides textual information, may exhibit variations in document fragments representing the same real-world entity. Moreover, similarity computation is substantially more expensive for tree-structured objects and, thus, is a serious performance concern. This thesis describes the design and implementation of an effective, flexible, and high-performance XML-based similarity join framework. As main contributions, we present novel structure-conscious similarity functions for XML trees - either considering XML structure in isolation or combined with textual information -, mechanisms to support the selection of relevant information from XML trees and organization of this information into a suitable format for similarity calculation, and efficient algorithms for large-scale identification of similar, set-represented objects. Finally, we validate the applicability of our techniques by integrating our framework into a native XML database management system; in this context we address several issues around the integration of similarity operations into traditional database architectures.
This thesis presents a novel, generic framework for information segmentation in document images.
A document image contains different types of information, for instance, text (machine printed/handwritten), graphics, signatures, and stamps.
It is necessary to segment information in documents so that to process such segmented information only when required in automatic document processing workflows.
The main contribution of this thesis is the conceptualization and implementation of an information segmentation framework that is based on part-based features.
The generic nature of the presented framework makes it applicable to a variety of documents (technical drawings, magazines, administrative, scientific, and academic documents) digitized using different methods (scanners, RGB cameras, and hyper-spectral imaging (HSI) devices).
A highlight of the presented framework is that it does not require large training sets, rather a few training samples (for instance, four pages) lead to high performance, i.e., better than previously existing methods.
In addition, the presented framework is simple and can be adapted quickly to new problem domains.
This thesis is divided into three major parts on the basis of document digitization method (scanned, hyper-spectral imaging, and camera captured) used.
In the area of scanned document images, three specific contributions have been realized.
The first of them is in the domain of signature segmentation in administrative documents.
In some workflows, it is very important to check the document authenticity before processing the actual content.
This can be done based on the available seal of authenticity, e.g., signatures.
However, signature verification systems expect pre-segmented signature image, while signatures are usually a part of document.
To use signature verification systems on document images, it is necessary to first segment signatures in documents.
This thesis shows that the presented framework can be used to segment signatures in administrative documents.
The system based on the presented framework is tested on a publicly available dataset where it outperforms the state-of-the-art methods and successfully segmented all signatures, while less than half of the found signatures are false positives.
This shows that it can be applied for practical use.
The second contribution in the area of scanned document images is segmentation of stamps in administrative documents.
A stamp also serves as a seal for documents authenticity.
However, the location of stamp on the document can be more arbitrary than a signature depending on the person sealing the document.
This thesis shows that a system based on our generic framework is able to extract stamps of any arbitrary shape and color.
The evaluation of the presented system on a publicly available dataset shows that it is also able to segment black stamps (that were not addressed in the past) with a recall and precision of 83% and 73%, respectively.
%Furthermore, to segment colored stamps, this thesis presents a novel feature set which is based on intensity gradient, is able to extract unseen, colored, arbitrary shaped, textual as well as graphical stamps, and outperforms the state-of-the-art methods.
The third contribution in the scanned document images is in the domain of information segmentation in technical drawings (architectural floorplans, maps, circuit diagrams, etc.) containing usually a large amount of graphics and comparatively less textual components. Further, as in technical drawings, text is overlapping with graphics.
Thus, automatic analysis of technical drawings uses text/graphics segmentation as a pre-processing step.
This thesis presents a method based on our generic information segmentation framework that is able to detect the text, which is touching graphical components in architectural floorplans and maps.
Evaluation of the method on a publicly available dataset of architectural floorplans shows that it is able to extract almost all touching text components with precision and recall of 71% and 95%, respectively.
This means that almost all of the touching text components are successfully extracted.
In the area of hyper-spectral document images, two contributions have been realized.
Unlike normal three channels RGB images, hyper-spectral images usually have multiple channels that range from ultraviolet to infrared regions including the visible region.
First, this thesis presents a novel automatic method for signature segmentation from hyper-spectral document images (240 spectral bands between 400 - 900 nm).
The presented method is based on a part-based key point detection technique, which does not use any structural information, but relies only on the spectral response of the document regardless of ink color and intensity.
The presented method is capable of segmenting (overlapping and non-overlapping) signatures from varying backgrounds like, printed text, tables, stamps, logos, etc.
Importantly, the presented method can extract signature pixels and not just the bounding boxes.
This is substantial when signatures are overlapping with text and/or other objects in image. Second, this thesis presents a new dataset comprising of 300 documents scanned using a high-resolution hyper-spectral scanner. Evaluation of the presented signature segmentation method on this hyper-spectral dataset shows that it is able to extract signature pixels with the precision and recall of 100% and 79%, respectively.
Further contributions have been made in the area of camera captured document images. A major problem in the development of Optical Character Recognition (OCR) systems for camera captured document images is the lack of labeled camera captured document images datasets. In the first place, this thesis presents a novel, generic, method for automatic ground truth generation/labeling of document images. The presented method builds large-scale (i.e., millions of images) datasets of labeled camera captured / scanned documents without any human intervention. The method is generic and can be used for automatic ground truth generation of (scanned and/or camera captured) documents in any language, e.g., English, Russian, Arabic, Urdu. The evaluation of the presented method, on two different datasets in English and Russian, shows that 99.98% of the images are correctly labeled in every case.
Another important contribution in the area of camera captured document images is the compilation of a large dataset comprising 1 million word images (10 million character images), captured in a real camera-based acquisition environment, along with the word and character level ground truth. The dataset can be used for training as well as testing of character recognition systems for camera-captured documents. Various benchmark tests are performed to analyze the behavior of different open source OCR systems on camera captured document images. Evaluation results show that the existing OCRs, which already get very high accuracies on scanned documents, fail on camera captured document images.
Using the presented camera-captured dataset, a novel character recognition system is developed which is based on a variant of recurrent neural networks, i.e., Long Short Term Memory (LSTM) that outperforms all of the existing OCR engines on camera captured document images with an accuracy of more than 95%.
Finally, this thesis provides details on various tasks that have been performed in the area closely related to information segmentation. This includes automatic analysis and sketch based retrieval of architectural floor plan images, a novel scheme for online signature verification, and a part-based approach for signature verification. With these contributions, it has been shown that part-based methods can be successfully applied to document image analysis.
Data is the new gold and serves as a key to answer the five W’s (Who, What, Where, When, Why) and How’s of any business. Companies are now mining data more than ever and one of the most important aspects while analyzing this data is to detect anomalous patterns to identify critical patterns and points. To tackle the vital aspects of timeseries analysis, this thesis presents a novel hybrid framework that stands on three pillars: Anomaly Detection, Uncertainty Estimation,
and Interpretability and Explainability.
The first pillar is comprised of contributions in the area of time-series anomaly detection. Deep Anomaly Detection for Time-series (DeepAnT), a novel deep learning-based anomaly detection method, lies at the foundation of the proposed hybrid framework and addresses the inadequacy of traditional anomaly detection methods. To the best of the author’s knowledge, Convolutional Neural Network (CNN) was used for the first time in Deep Anomaly Detection for Time-series (DeepAnT) to robustly detect multiple types of anomalies in the tricky
and continuously changing time-series data. To further improve the anomaly detection performance, a fusion-based method, Fusion of
Statistical and Deep Learning for Anomaly Detection (FuseAD) is proposed. This method aims to combine the strengths of existing wellfounded
statistical methods and powerful data-driven methods.
In the second pillar of this framework, a hybrid approach that combines the high accuracy of the deterministic models with the posterior distribution approximation of Bayesian neural networks is proposed.
In the third pillar of the proposed framework, mechanisms to enable both HOW and WHY parts are presented.
Embedded systems have become ubiquitous in everyday life, and especially in the automotive industry. New applications challenge their design by introducing a new class of problems that are based on a detailed analysis of the environmental situation. Situation analysis systems rely on models and algorithms of the domain of computational geometry. The basic model is usually an Euclidean plane, which contains polygons to represent the objects of the environment. Usual implementations of computational geometry algorithms cannot be directly used for safety-critical systems. First, a strict analysis of their correctness is indispensable and second, nonfunctional requirements with respect to the limited resources must be considered. This thesis proposes a layered approach to a polygon-processing system. On top of rational numbers, a geometry kernel is formalised at first. Subsequently, geometric primitives form a second layer of abstraction that is used for plane sweep and polygon algorithms. These layers do not only divide the whole system into manageable parts but make it possible to model problems and reason about them at the appropriate level of abstraction. This structure is used for the verification as well as the implementation of the developed polygon-processing library.
Nowadays, accounting, charging and billing users' network resource consumption are commonly used for the purpose of facilitating reasonable network usage, controlling congestion, allocating cost, gaining revenue, etc. In traditional IP traffic accounting systems, IP addresses are used to identify the corresponding consumers of the network resources. However, there are some situations in which IP addresses cannot be used to identify users uniquely, for example, in multi-user systems. In these cases, network resource consumption can only be ascribed to the owners of these hosts instead of corresponding real users who have consumed the network resources. Therefore, accurate accountability in these systems is practically impossible. This is a flaw of the traditional IP address based IP traffic accounting technique. This dissertation proposes a user based IP traffic accounting model which can facilitate collecting network resource usage information on the basis of users. With user based IP traffic accounting, IP traffic can be distinguished not only by IP addresses but also by users. In this dissertation, three different schemes, which can achieve the user based IP traffic accounting mechanism, are discussed in detail. The inband scheme utilizes the IP header to convey the user information of the corresponding IP packet. The Accounting Agent residing in the measured host intercepts IP packets passing through it. Then it identifies the users of these IP packets and inserts user information into the IP packets. With this mechanism, a meter located in a key position of the network can intercept the IP packets tagged with user information, extract not only statistic information, but also IP addresses and user information from the IP packets to generate accounting records with user information. The out-of-band scheme is a contrast scheme to the in-band scheme. It also uses an Accounting Agent to intercept IP packets and identify the users of IP traffic. However, the user information is transferred through a separated channel, which is different from the corresponding IP packets' transmission. The Multi-IP scheme provides a different solution for identifying users of IP traffic. It assigns each user in a measured host a unique IP address. Through that, an IP address can be used to identify a user uniquely without ambiguity. This way, traditional IP address based accounting techniques can be applied to achieve the goal of user based IP traffic accounting. In this dissertation, a user based IP traffic accounting prototype system developed according to the out-of-band scheme is also introduced. The application of user based IP traffic accounting model in the distributed computing environment is also discussed.
Computer-based simulation and visualization of acoustics of a virtual scene can aid during the design process of concert halls, lecture rooms, theaters, or living rooms. Because, not only the visual aspect of the room is important, but also its acoustics. In factory floors noise reduction is important since noise is hazardous to health. Despite the obvious dissimilarity between our aural and visual senses, many techniques required for the visualization of photo-realistic images and for the auralization of acoustic environments are quite similar. Both applications can be served by geometric methods such as particle- and ray tracing if we neglect a number of less important effects. By means of the simulation of room acoustics we want to predict the acoustic properties of a virtual model. For auralization, a pulse response filter needs to be assembled for each pair of source and listener positions. The convolution of this filter with an anechoic source signal provides the signal received at the listener position. Hence, the pulse response filter must contain all reverberations (echos) of a unit pulse, including their frequency decompositions due to absorption at different surface materials. For the room acoustic simulation a method named phonon tracing, since it is based on particles, is developed. The approach computes the energy or pressure decomposition for each particle (phonon) sent out from a sound source and uses this in a second pass (phonon collection) to construct the response filters for different listeners. This step can be performed in different precision levels. During the tracing step particle paths and additional information are stored in a so called phonon map. Using this map several sound visualization approaches were developed. From the visualization, the effect of different materials on the spectral energy / pressure distribution can be observed. The first few reflections already show whether certain frequency bands are rapidly absorbed. The absorbing materials can be identified and replaced in the virtual model, improving the overall acoustic quality of the simulated room. Furthermore an insight into the pressure / energy received at the listener position is possible. The phonon tracing algorithm as well as several sound visualization approaches are integrated into a common system utilizing Virtual Reality technologies in order to facilitate the immersion into the virtual scene. The system is a prototype developed within a project at the University of Kaiserslautern and is still a subject of further improvements. It consists of a stereoscopic back-projection system for visual rendering as well as professional audio equipment for auralization purposes.
Three dimensional (3d) point data is used in industry for measurement and reverse engineering. Precise point data is usually acquired with triangulating laser scanners or high precision structured light scanners. Lower precision point data is acquired by real-time structured light devices or by stereo matching with multiple cameras. The basic principle of all these methods is the so-called triangulation of 3d coordinates from two dimensional (2d) camera images.
This dissertation contributes a method for multi-camera stereo matching that uses a system of four synchronized cameras. A GPU based stereo matching method is presented to achieve a high quality reconstruction at interactive frame rates. Good depth resolution is achieved by allowing large disparities between the images. A multi level approach on the GPU allows a fast processing of these large disparities. In reverse engineering, hand-held laser scanners are used for the scanning of complex shaped objects. The operator of the scanner can scan complex regions slower, multiple times, or from multiple angles to achieve a higher point density. Traditionally, computer aided design (CAD) geometry is reconstructed in a separate step after the scanning. Errors or missing parts in the scan prevent a successful reconstruction. The contribution of this dissertation is an on-line algorithm that allows the reconstruction during the scanning of an object. Scanned points are added to the reconstruction and improve it on-line. The operator can detect the areas in the scan where the reconstruction needs additional data.
First, the point data is thinned out using an octree based data structure. Local normals and principal curvatures are estimated for the reduced set of points. These local geometric values are used for segmentation using a region growing approach. Implicit quadrics are fitted to these segments. The canonical form of the quadrics provides the parameters of basic geometric primitives.
An improved approach uses so called accumulated means of local geometric properties to perform segmentation and primitive reconstruction in a single step. Local geometric values can be added and removed on-line to these means to get a stable estimate over a complete segment. By estimating the shape of the segment it is decided which local areas are added to a segment. An accumulated score estimates the probability for a segment to belong to a certain type of geometric primitive. A boundary around the segment is reconstructed using a growing algorithm that ensures that the boundary is closed and avoids self intersections.
Adaptive Extraction and Representation of Geometric Structures from Unorganized 3D Point Sets
(2009)
The primary emphasis of this thesis concerns the extraction and representation of intrinsic properties of three-dimensional (3D) unorganized point clouds. The points establishing a point cloud as it mainly emerges from LiDaR (Light Detection and Ranging) scan devices or by reconstruction from two-dimensional (2D) image series represent discrete samples of real world objects. Depending on the type of scenery the data is generated from the resulting point cloud may exhibit a variety of different structures. Especially, in the case of environmental LiDaR scans the complexity of the corresponding point clouds is relatively high. Hence, finding new techniques allowing the efficient extraction and representation of the underlying structural entities becomes an important research issue of recent interest. This thesis introduces new methods regarding the extraction and visualization of structural features like surfaces and curves (e.g. ridge-lines, creases) from 3D (environmental) point clouds. One main part concerns the extraction of curve-like features from environmental point data sets. It provides a new method supporting a stable feature extraction by incorporating a probability-based point classification scheme that characterizes individual points regarding their affiliation to surface-, curve- and volume-like structures. Another part is concerned with the surface reconstruction from (environmental) point clouds exhibiting objects that are more or less complex. A new method providing multi-resolutional surface representations from regular point clouds is discussed. Following the applied principles of this approach a volumetric surface reconstruction method based on the proposed classification scheme is introduced. It allows the reconstruction of surfaces from highly unstructured and noisy point data sets. Furthermore, contributions in the field of reconstructing 3D point clouds from 2D image series are provided. In addition, a discussion concerning the most important properties of (environmental) point clouds with respect to feature extraction is presented.
If gradient based derivative algorithms are used to improve industrial products by reducing their target functions, the derivatives need to be exact.
The last percent of possible improvement, like the efficiency of a turbine, can only be gained if the derivatives are consistent with the solution process that is used in the simulation software.
It is problematic that the development of the simulation software is an ongoing process which leads to the use of approximated derivatives.
If a derivative computation is implemented manually, it will be inconsistent after some time if it is not updated.
This thesis presents a generalized approach which differentiates the whole simulation software with Algorithmic Differentiation (AD), and guarantees a correct and consistent derivative computation after each change to the software.
For this purpose, the variable tagging technique is developed.
The technique checks at run-time if all dependencies, which are used by the derivative algorithms, are correct.
Since it is also necessary to check the correctness of the implementation, a theorem is developed which describes how AD derivatives can be compared.
This theorem is used to develop further methods that can detect and correct errors.
All methods are designed such that they can be applied in real world applications and are used within industrial configurations.
The process described above yields consistent and correct derivatives but the efficiency can still be improved.
This is done by deriving new derivative algorithms.
A fixed-point iterator approach, with a consistent derivation, yields all state of the art algorithms and produces two new algorithms.
These two new algorithms include all implementation details and therefore they produce consistent derivative results.
For detecting hot spots in the application, the state of the art techniques are presented and extended.
The data management is changed such that the performance of the software is affected only marginally when quantities, like the number of input and output variables or the memory consumption, are computed for the detection.
The hot spots can be treated with techniques like checkpointing or preaccumulation.
How these techniques change the time and memory consumption is analyzed and it is shown how they need to be used in selected AD tools.
As a last step, the used AD tools are analyzed in more detail.
The major implementation strategies for operator overloading AD tools are presented and implementation improvements for existing AD tools are discussed.
The discussion focuses on a minimal memory consumption and makes it possible to compare AD tools on a theoretical level.
The new AD tool CoDiPack is based on these findings and its design and concepts are presented.
The improvements and findings in this thesis make it possible, that an automatic, consistent and correct derivative is generated in an efficient way for industrial applications.
Automated theorem proving is a search problem and, by its undecidability, a very difficult one. The challenge in the development of a practically successful prover is the mapping of the extensively developed theory into a program that runs efficiently on a computer. Starting from a level-based system model for automated theorem provers, in this work we present different techniques that are important for the development of powerful equational theorem provers. The contributions can be divided into three areas: Architecture. We present a novel prover architecture that is based on a set-based compression scheme. With moderate additional computational costs we achieve a substantial reduction of the memory requirements. Further wins are architectural clarity, the easy provision of proof objects, and a new way to parallelize a prover which shows respectable speed-ups in practice. The compact representation paves the way to new applications of automated equational provers in the area of verification systems. Algorithms. To improve the speed of a prover we need efficient solutions for the most time-consuming sub-tasks. We demonstrate improvements of several orders of magnitude for two of the most widely used term orderings, LPO and KBO. Other important contributions are a novel generic unsatisfiability test for ordering constraints and, based on that, a sufficient ground reducibility criterion with an excellent cost-benefit ratio. Redundancy avoidance. The notion of redundancy is of central importance to justify simplifying inferences which are used to prune the search space. In our experience with unfailing completion, the usual notion of redundancy is not strong enough. In the presence of associativity and commutativity, the provers often get stuck enumerating equations that are permutations of each other. By extending and refining the proof ordering, many more equations can be shown redundant. Furthermore, our refinement of the unfailing completion approach allows us to use redundant equations for simplification without the need to consider them for generating inferences. We describe the efficient implementation of several redundancy criteria and experimentally investigate their influence on the proof search. The combination of these techniques results in a considerable improvement of the practical performance of a prover, which we demonstrate with extensive experiments for the automated theorem prover Waldmeister. The progress achieved allows the prover to solve problems that were previously out of reach. This considerably enhances the potential of the prover and opens up the way for new applications.
Stochastic Network Calculus (SNC) emerged from two branches in the late 90s:
the theory of effective bandwidths and its predecessor the Deterministic Network
Calculus (DNC). As such SNC’s goal is to analyze queueing networks and support
their design and control.
In contrast to queueing theory, which strives for similar goals, SNC uses in-
equalities to circumvent complex situations, such as stochastic dependencies or
non-Poisson arrivals. Leaving the objective to compute exact distributions behind,
SNC derives stochastic performance bounds. Such a bound would, for example,
guarantee a system’s maximal queue length that is violated by a known small prob-
ability only.
This work includes several contributions towards the theory of SNC. They are
sorted into four main contributions:
(1) The first chapters give a self-contained introduction to deterministic net-
work calculus and its two branches of stochastic extensions. The focus lies on the
notion of network operations. They allow to derive the performance bounds and
simplifying complex scenarios.
(2) The author created the first open-source tool to automate the steps of cal-
culating and optimizing MGF-based performance bounds. The tool automatically
calculates end-to-end performance bounds, via a symbolic approach. In a second
step, this solution is numerically optimized. A modular design allows the user to
implement their own functions, like traffic models or analysis methods.
(3) The problem of the initial modeling step is addressed with the development
of a statistical network calculus. In many applications the properties of included
elements are mostly unknown. To that end, assumptions about the underlying
processes are made and backed by measurement-based statistical methods. This
thesis presents a way to integrate possible modeling errors into the bounds of SNC.
As a byproduct a dynamic view on the system is obtained that allows SNC to adapt
to non-stationarities.
(4) Probabilistic bounds are fundamentally different from deterministic bounds:
While deterministic bounds hold for all times of the analyzed system, this is not
true for probabilistic bounds. Stochastic bounds, although still valid for every time
t, only hold for one time instance at once. Sample path bounds are only achieved by
using Boole’s inequality. This thesis presents an alternative method, by adapting
the theory of extreme values.
(5) A long standing problem of SNC is the construction of stochastic bounds
for a window flow controller. The corresponding problem for DNC had been solved
over a decade ago, but remained an open problem for SNC. This thesis presents
two methods for a successful application of SNC to the window flow controller.
In recent years, the Internet has become a major source of visual information exchange. Popular social platforms have reported an average of 80 million photo uploads a day. These images, are often accompanied with a user provided text one-liner, called an image caption. Deep Learning techniques have made significant advances towards automatic generation of factual image captions. However, captions generated by humans are much more than mere factual image descriptions. This work takes a step towards enhancing a machine's ability to generate image captions with human-like properties. We name this field as Affective Image Captioning, to differentiate it from the other areas of research focused on generating factual descriptions.
To deepen our understanding of human generated captions, we first perform a large-scale Crowd-Sourcing study on a subset of Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M). Three thousand random image-caption pairs were evaluated by native English speakers w.r.t different dimensions like focus, intent, emotion, meaning, and visibility. Our findings indicate three important underlying properties of human captions: subjectivity, sentiment, and variability. Based on these results, we develop Deep Learning models to address each of these dimensions.
To address the subjectivity dimension, we propose the Focus-Aspect-Value (FAV) model (along with a new task of aspect-detection) to structure the process of capturing subjectivity. We also introduce a novel dataset, aspects-DB, following this way of modeling. To implement the model, we propose a novel architecture called Tensor Fusion. Our experiments show that Tensor Fusion outperforms the state-of-the-art cross residual networks (XResNet) in aspect-detection.
Towards the sentiment dimension, we propose two models:Concept & Syntax Transition Network (CAST) and Show & Tell with Emotions (STEM). The CAST model uses a graphical structure to generate sentiment. The STEM model uses a neural network to inject adjectives into a neutral caption. Achieving a high score of 93% with human evaluation, these models were selected as the top-3 at the ACMMM Grand Challenge 2016.
To address the last dimension, variability, we take a generative approach called Generative Adversarial Networks (GAN) along with multimodal fusion. Our modified GAN, with two discriminators, is trained using Reinforcement Learning. We also show that it is possible to control the properties of the generated caption-variations with an external signal. Using sentiment as the external signal, we show that we can easily outperform state-of-the-art sentiment caption models.
Software is becoming increasingly concurrent: parallelization, decentralization, and reactivity necessitate asynchronous programming in which processes communicate by posting messages/tasks to others’ message/task buffers. Asynchronous programming has been widely used to build fast servers and routers, embedded systems and sensor networks, and is the basis of Web programming using Javascript. Languages such as Erlang and Scala have adopted asynchronous programming as a fundamental concept with which highly scalable and highly reliable distributed systems are built.
Asynchronous programs are challenging to implement correctly: the loose coupling between asynchronously executed tasks makes the control and data dependencies difficult to follow. Even subtle design and programming mistakes on the programs have the capability to introduce erroneous or divergent behaviors. As asynchronous programs are typically written to provide a reliable, high-performance infrastructure, there is a critical need for analysis techniques to guarantee their correctness.
In this dissertation, I provide scalable verification and testing tools to make asyn- chronous programs more reliable. I show that the combination of counter abstraction and partial order reduction is an effective approach for the verification of asynchronous systems by presenting PROVKEEPER and KUAI, two scalable verifiers for two types of asynchronous systems. I also provide a theoretical result that proves a counter-abstraction based algorithm called expand-enlarge-check, is an asymptotically optimal algorithm for the coverability problem of branching vector addition systems as which many asynchronous programs can be modeled. In addition, I present BBS and LLSPLAT, two testing tools for asynchronous programs that efficiently uncover many subtle memory violation bugs.
Due to its performance, the field of deep learning has gained a lot of attention, with neural networks succeeding in areas like \( \textit{Computer Vision} \) (CV), \( \textit{Neural Language Processing} \) (NLP), and \( \textit{Reinforcement Learning} \) (RL). However, high accuracy comes at a computational cost as larger networks require longer training time and no longer fit onto a single GPU. To reduce training costs, researchers are looking into the dynamics of different optimizers, in order to find ways to make training more efficient. Resource requirements can be limited by reducing model size during training or designing more efficient models that improve accuracy without increasing network size.
This thesis combines eigenvalue computation and high-dimensional loss surface visualization to study different optimizers and deep neural network models. Eigenvectors of different eigenvalues are computed, and the loss landscape and optimizer trajectory are projected onto the plane spanned by those eigenvectors. A new parallelization method for the stochastic Lanczos method is introduced, resulting in faster computation and thus enabling high-resolution videos of the trajectory and second-order information during neural network training. Additionally, the thesis presents the loss landscape between two minima along with the eigenvalue density spectrum at intermediate points for the first time.
Secondly, this thesis presents a regularization method for \( \textit{Generative Adversarial Networks} \) (GANs) that uses second-order information. The gradient during training is modified by subtracting the eigenvector direction of the biggest eigenvalue, preventing the network from falling into the steepest minima and avoiding mode collapse. The thesis also shows the full eigenvalue density spectra of GANs during training.
Thirdly, this thesis introduces ProxSGD, a proximal algorithm for neural network training that guarantees convergence to a stationary point and unifies multiple popular optimizers. Proximal gradients are used to find a closed-form solution to the problem of training neural networks with smooth and non-smooth regularizations, resulting in better sparsity and more efficient optimization. Experiments show that ProxSGD can find sparser networks while reaching the same accuracy as popular optimizers.
Lastly, this thesis unifies sparsity and \( \textit{neural architecture search} \) (NAS) through the framework of group sparsity. Group sparsity is achieved through \( \ell_{2,1} \)-regularization during training, allowing for filter and operation pruning to reduce model size with minimal sacrifice in accuracy. By grouping multiple operations together, group sparsity can be used for NAS as well. This approach is shown to be more robust while still achieving competitive accuracies compared to state-of-the-art methods.
Open distributed systems are a class of distributed systems where (i) only partial information about the environment, in which they are running, is present, (ii) new resources may become available at runtime, and (iii) a subsystem may become aware of other subsystems after some interaction. Modeling and implementing such systems correctly is a complex task due to the openness and the dynamicity aspects. One way to ensure that the resulting systems behave correctly is to utilize formal verification.
Formal verification requires an adequate semantic model of the implementation, a specification of the desired behavior, and a reasoning technique. The actor model is a semantic model that captures the challenging aspects of open distributed systems by utilizing actors as universal primitives to represent system entities and allowing them to create new actors and to communicate by sending directed messages as reply to received messages. To enable compositional reasoning, where the reasoning task is reduced to independent verification of the system parts, semantic entities at a higher level of abstraction than actors are needed.
This thesis proposes an automaton model and combines sound reasoning techniques to compositionally verify implementations of open actor systems. Based on I/O automata, the model allows automata to be created dynamically and captures dynamic changes in communication patterns. Each automaton represents either an actor or a group of actors. The specification of the desired behavior is given constructively as an automaton. As the basis for compositionality, we formalize a component notion based on the static structure of the implementation instead of the dynamic entities (the actors) occurring in the system execution. The reasoning proceeds in two stages. The first stage establishes the connection between the automata representing single actors and their implementation description by means of weakest liberal preconditions. The second stage employs this result as the basis for verifying whether a component specification is satisfied. The verification is done by building a simulation relation from the automaton representing the implementation to the component's automaton. Finally, we validate the compositional verification approach through a number of examples by proving correctness of their actor implementations with respect to system specifications.
An Efficient Automated Machine Learning Framework for Genomics and Proteomics Sequence Analysis
(2023)
Genomics and Proteomics sequence analyses are the scientific studies of understanding the language of Deoxyribonucleic Acid (DNA), Ribonucleic Acid (RNA) and protein biomolecules with an objective of controlling the production of proteins and understanding their core functionalities. It helps to detect chronic diseases in early stages, root causes of clinical changes, key genetic targets for pharmaceutical development and optimization of therapeutics for various age groups. Most Genomics and Proteomics sequence analysis work is performed using typical wet lab experimental approaches that make use of different genetic diagnostic technologies. However, these approaches are costly, time consuming, skill and labor intensive. Hence, these approaches slow down the process of developing an efficient and economical sequence analysis landscape essential to demystify a variety of cellular processes and functioning of biomolecules in living organisms. To empower manual wet lab experiment driven research, many machine learning based approaches have been developed in recent years. However, these approaches cannot be used in practical environment due to their limited performance. Considering the sensitive and inherently demanding nature of Genomics and Proteomics sequence
analysis which can have very far-reaching as well as serious repercussions on account of misdiagnosis, the main
objective of this research is to develop an efficient automated computational framework for Genomics and Proteomics sequence analysis using the predictive and prescriptive analytical powers of Artificial Intelligence (AI) to significantly improve healthcare operations.
The proposed framework is comprised of 3 main components namely sequence encoding, feature engineering and
discrete or continuous value predictor. The sequence encoding module is equipped with a variety of existing and newly developed sequence encoding algorithms that are capable of generating a rich statistical representation of DNA, RNA and protein raw sequences. The feature engineering module has diverse types of feature selection and dimensionality reduction approaches which can be used to generate the most effective feature space. Furthermore, the discrete and/or continuous value predictor module of the proposed framework contains a wide range of existing machine learning and newly developed deep learning regressors and classifiers. To evaluate the integrity and generalizability of the proposed framework, we have performed a large-scale experimentation over diverse types of Genomics and Proteomics sequence analysis tasks (i.e., DNA, RNA and proteins).
In Genomics analysis, Epigenetic modification detection is one of the key component. It helps clinical researchers and practitioners to distinguish normal cellular activities from malfunctioned ones, which can lead to diverse genetic disorders such as metabolic disorders, cancers, etc. To support this analysis, the proposed framework is used to solve the problem of DNA and Histone modification prediction where it has achieved state-of-the-art performance on 27 publicly available benchmark datasets of 17 different species with best accuracy of 97%. RNA sequence analysis is another vital component of Genomics sequence analysis where the identification of different coding and non-coding RNAs as well as their subcellular localization patterns help to demystify the functions of diverse RNAs, root causes of clinical changes, develop precision medicine and optimize therapeutics. To support this analysis, the proposed framework is utilized for non-coding RNA classification and multi-compartment RNA subcellular localization prediction. Where it achieved state-of-the-art performance on 10 publicly available benchmark datasets of Homo sapiens and Mus Musculus species with best accuracy of 98%.
Proteomics sequence analysis is essential to demystify the virus pathogenesis, host immunity responses, the way
proteins affect or are affected by the cell processes, their structure and core functionalities. To support this analysis, the proposed framework is used for host protein-protein and virus-host protein-protein interaction prediction. It has achieved state-of-the-art performance on 2 publicly available protein protein interaction datasets of Homo Sapiens and Mus Musculus species with best accuracy of 96% and 7 viral host protein protein interaction datasets of multiple hosts and viruses with best accuracy of 94%. Considering the performance and practical significance of proposed framework, we believe proposed framework will help researchers in developing cutting-edge practical applications for diverse Genomic and Proteomic sequence analyses tasks (i.e., DNA, RNA and proteins).
Multidisciplinary optimizations (MDOs) encompass optimization problems that combine different disciplines into a single optimization with the aim of converging towards a design that simultaneously fulfills multiple criteria. For example, considering both fluid and structural disciplines to obtain a shape that is not only aerodynamically efficient, but also respects structural constraints. Combined with CAD-based parametrizations, the optimization produces an improved, manufacturable shape. For turbomachinery applications, this method has been successfully applied using gradient-free optimization methods such as genetic algorithms, surrogate modeling, and others. While such algorithms can be easily applied without access to the source code, the number of iterations to converge is dependent on the number of design parameters. This results in high computational costs and limited design spaces. A competitive alternative is offered by gradient-based optimization algorithms combined with adjoint methods, where the computational complexity of the gradient calculation is no longer dependent on the number of design parameters, but rather on the number of outputs. Such methods have been extensively used in single-disciplinary aerodynamic optimizations using adjoint fluid solvers and CAD parametrizations. However, CAD-based MDOs leveraging adjoint methods are just beginning to emerge.
This thesis contributes to this field of research by setting up a CAD-based adjoint MDO framework for turbomachinery design using both fluid and structural disciplines. To achieve this, the von Kármán Institute’s existing CAD-based optimization framework cado is augmented by the development of a FEM-based structural solver which has been differentiated using the algorithmic differentiation tool CoDiPack of TU Kaiserslautern. While most of the code could be differentiated in a black-box fashion, special treatment is required for the iterative linear and eigenvalue solvers to ensure accuracy and reduce memory consumption. As a result, the solver has the capability of computing both stress and vibration gradients at a cost independent on the number of design parameters. For the presented application case of a radial turbine optimization, the full gradient calculation has a computational cost of approximately 3.14 times the cost of a primal run and the peak memory usage of approximately 2.76 times that of a primal run.
The FEM code leverages object-oriented design such that the same base structure can be reused for different purposes with minimal re-differentiation. This is demonstrated by considering a composite material test case where the gradients could be easily calculated with respect to an extended design space that includes material properties. Additionally, the structural solver is reused within a CAD-based mesh deformation, which propagates the structural FEM mesh gradients through to the CAD parameters. This closes the link between the CAD shape and FEM mesh. Finally, the MDO framework is applied by optimizing the aerodynamic efficiency of a radial turbine while respecting structural constraints.
Optical Character Recognition (OCR) system plays an important role in digitization of data acquired as images from a variety of sources. Although the area is very well explored for Latin languages, some of the languages based on Arabic cursive script are not yet explored. It is due to many factors: Most importantly are the unavailability of proper data sets and complexities posed by cursive scripts. The Pashto language is one of such languages which needs considerable exploration towards OCR. In order to develop such an OCR system, this thesis provides a pioneering study that explores deep learning for the Pashto language in the field of OCR.
The Pashto language is spoken by more than $50$ million people across the world, and it is an active medium both for oral as well as written communication. It is associated with rich literary heritage and contains huge written collection. These written materials present contents of simple to complex nature, and layouts from hand-scribed to printed text. The Pashto language presents mainly two types of complexities (i) generic w.r.t. cursive script, (ii) specific w.r.t. Pashto language. Generic complexities are cursiveness, context dependency, and breaker character anomalies, as well as space anomalies. Pashto specific complexities are variations in shape for a single character and shape similarity for some of the additional Pashto characters. Existing research in the area of Arabic OCR did not lead to an end-to-end solution for the mentioned complexities and therefore could not be generalized to build a sophisticated OCR system for Pashto.
The contribution of this thesis spans in three levels, conceptual level, data level, and practical level. In the conceptual level, we have deeply explored the Pashto language and identified those characters, which are responsible for the challenges mentioned above. In the data level, a comprehensive dataset is introduced containing real images of hand-scribed contents. The dataset is manually transcribed and has the most frequent layout patterns associated with the Pashto language. The practical level contribution provides a bridge, in the form of a complete Pashto OCR system, and connects the outcomes of the conceptual and data levels contributions. The practical contribution comprises of skew detection, text-line segmentation, feature extraction, classification, and post-processing. The OCR module is more strengthened by using deep learning paradigm to recognize Pashto cursive script by the framework of Recursive Neural Networks (RNN). Proposed Pashto text recognition is based on Long Short-Term Memory Network (LSTM) and realizes a character recognition rate of $90.78\%$ on Pashto real hand-scribed images. All these contributions are integrated into an application to provide a flexible and generic End-to-End Pashto OCR system.
The impact of this thesis is not only specific to the Pashto language, but it is also beneficial to other cursive languages like Arabic, Urdu, and Persian e.t.c. The main reason is the Pashto character set, which is a superset of Arabic, Persian, and Urdu languages. Therefore, the conceptual contribution of this thesis provides insight and proposes solutions to almost all generic complexities associated with Arabic, Persian, and Urdu languages. For example, an anomaly caused by breaker characters is deeply analyzed, which is shared among 70 languages, mainly use Arabic script. This thesis presents a solution to this issue and is equally beneficial to almost all Arabic like languages.
The scope of this thesis has two important aspects. First, a social impact, i.e., how a society may benefit from it. The main advantages are to bring the historical and almost vanished document to life and to ensure the opportunities to explore, analyze, translate, share, and understand the contents of Pashto language globally. Second, the advancement and exploration of the technical aspects. Because, this thesis empirically explores the recognition and challenges which are solely related to the Pashto language, both regarding character-set and the materials which present such complexities. Furthermore, the conceptual and practical background of this thesis regarding complexities of Pashto language is very beneficial regarding OCR for other cursive languages.
The development of autonomous mobile robots is a major topic of current research. As those robots must be able to react to changing environments and avoid collisions also with moving obstacles, the fulfilment of safety requirements is an important aspect. Behaviour-based systems (BBS) have proven to meet several of the properties required for these kindsof robots, such as reactivity, extensibility and re-usability of individual components. BBS consist of a number of behavioural components that individually realise simple tasks. Their interconnection allows to achieve complex robot behaviour, which implies that correct
connections are crucial. The resulting networks can get very large making them difficult to verify. This dissertation presents a novel concept for the analysis and verification of complex autonomous robot systems controlled by behaviour-based software architectures with special focus on the integration of environmental aspects into the processes.
Several analysis techniques have been investigated and adapted to the special requirements of BBS. These include a structural analysis, which is used to find constraint violations and faults in the network layout. Fault tree analysis is applied to identify root causes of hazards and the relationship of system events. For this, a technique to map the behaviour-based control network to the structure of a fault tree has been developed. Testing and data analysis are used for the detection of failures and their root causes. Here, a new concept that identifies patterns in data recorded during test runs has been introduced.
All of these methods cannot guarantee failure-free and safe robot behaviour and can never prove the absence of failures. Therefore, model checking as formal verification technique that proves a property to be correct for the given system, has been chosen to complement the set of analysis techniques. A novel concept for the integration of environmental influences into the model checking process is proposed. Environmental situations and the sensor processing chain are represented as synchronised automata similar to the modelling of the behavioural network. Tools supporting the whole verification process including the creation of formal queries in its environment have been developed.
During the verification of large behavioural networks, the scalability of the model checking approach appears as a big problem. Several approaches that deal with this problem have been investigated and the selection of slicing and abstraction methods has been justified. A concept for the application of these methods is provided, that reduces the behavioural network to the relevant parts before the actual verification process.
All techniques have been applied to the behaviour-based control system of the autonomous outdoor robot RAVON. Its complex network with more than 400 components allows for demonstrating the soundness of the presented concepts. The set of different techniques provides a fundamental basis for a comprehensive analysis and verification of BBS acting in changing environments.
This thesis is concerned with different null-models that are used in network analysis. Whenever it is of interest whether a real-world graph is exceptional regarding a particular measure, graphs from a null-model can be used to compare the real-world graph to. By analyzing an appropriate null-model, a researcher may find whether the results of the measure on the real-world graph is exceptional or not.
Deciding which null-model to use is hard and sometimes the difference between the null-models is not even considered. In this thesis, there are several results presented: First, based on simple global measures, undirected graphs are analyzed. The results for these measures indicates that it is not important which null-model is used, thus, the fastest algorithm of a null-model may be used. Next, local measures are investigated. The fastest algorithm proves to be the most complicated to analyze. The model includes multigraphs which do not meet the conditions of all the measures, thus, the measures themselves have to be altered to take care of multigraphs as well. After careful consideration, the conditions are met and the analysis shows, that the fastest is not always the best.
The same applies for directed graphs, as is shown in the last part. There, another more complex measure on graphs is introduced. I continue testing the applicability of several null-models; in the end, a set of equations proves to be fast and good enough as long as conditions regarding the degree sequence are met.
In recent decades, there has been increasing interest in analyzing the behavior of complex systems. A popular approach for analyzing such systems is a network analytic approach where the system is represented by a graph structure (Wassermann&Faust 1994, Boccaletti et al. 2006, Brandes&Erlebach 2005, Vespignani 2018): Nodes represent the system’s entities, edges their interactions. A large toolbox of network analytic methods, such as measures for structural properties (Newman 2010), centrality measures (Koschützki et al. 2005), or methods for identifying communities (Fortunato 2010), is readily available to be applied on any network structure. However, it is often overlooked that a network representation of a system and the (technically applicable) methods contain assumptions that need to be met; otherwise, the results are not interpretable or even misleading. The most important assumption of a network representation is the presence of indirect effects: If A has an impact on B, and B has an impact on C, then A has an impact on C (Zweig 2016, Brandes et al. 2013). The presence of indirect effects can be explained by ”something” flowing through the network by moving from node to node. Such network flows (or network processes) may be the propagation of information in social networks, the spread of infections, or entities using the network as infrastructure, such as in transportation networks. Also several network measures, particularly most centrality measures, assume the presence of such a network process, but additionally assume specific properties of the network processes (Borgatti 2005). Then, a centrality value indicates a node’s importance with respect to a process with these properties.
While this has been known for several years, only recently have datasets containing real-world network flows become accessible. In this context, the goal of this dissertation is to provide a better understanding of the actual behavior of real-world network processes, with a particular focus on centrality measures: If real-world network processes turn out to show different properties than those assumed by classic centrality measures, these measures might considerably under- or overestimate the importance of nodes for the actual network flow. To the best of our knowledge, there are only very few works addressing this topic.
The contributions of this thesis are therefore as follows: (i) We investigate in which aspects real-world network flows meet the assumptions contained about them in centrality measures. (ii) Since we find that the real-world flows show considerably different properties than assumed, we test to which extent the found properties can be explained by models, i.e., models based on shortest paths or random walks. (iii) We study whether the deviations from the assumed behavior have an impact on the results of centrality measures.
To this end, we introduce flow-based variants of centrality measures which are either based on the assumed behavior or on the actual behavior of the real-world network flow. This enables systematic evaluation of the impact of each assumption on the resulting rankings of centrality measures.
While–on a large scale–we observe a surprisingly large robustness of the measures against deviations in their assumptions, there are nodes whose importance is rated very differently when the real-world network flow is taken into account. (iv) As a technical contribution, we provide a method for an efficient handling of large sets of flow trajectories by summarizing them into groups of similar trajectories. (v) We furthermore present the results of an interdisciplinary research project in which the trajectories of humans in a network were analyzed in detail. In general, we are convinced that a process-driven perspective on network analysis in which the network process is considered in addition to the network representation, can help to better understand the behavior of complex systems.
Analyzing Centrality Indices in Complex Networks: an Approach Using Fuzzy Aggregation Operators
(2018)
The identification of entities that play an important role in a system is one of the fundamental analyses being performed in network studies. This topic is mainly related to centrality indices, which quantify node centrality with respect to several properties in the represented network. The nodes identified in such an analysis are called central nodes. Although centrality indices are very useful for these analyses, there exist several challenges regarding which one fits best
for a network. In addition, if the usage of only one index for determining central
nodes leads to under- or overestimation of the importance of nodes and is
insufficient for finding important nodes, then the question is how multiple indices
can be used in conjunction in such an evaluation. Thus, in this thesis an approach is proposed that includes multiple indices of nodes, each indicating
an aspect of importance, in the respective evaluation and where all the aspects of a node’s centrality are analyzed in an explorative manner. To achieve this
aim, the proposed idea uses fuzzy operators, including a parameter for generating different types of aggregations over multiple indices. In addition, several preprocessing methods for normalization of those values are proposed and discussed. We investigate whether the choice of different decisions regarding the
aggregation of the values changes the ranking of the nodes or not. It is revealed that (1) there are nodes that remain stable among the top-ranking nodes, which
makes them the most central nodes, and there are nodes that remain stable
among the bottom-ranking nodes, which makes them the least central nodes; and (2) there are nodes that show high sensitivity to the choice of normalization
methods and/or aggregations. We explain both cases and the reasons why the nodes’ rankings are stable or sensitive to the corresponding choices in various networks, such as social networks, communication networks, and air transportation networks.
This thesis discusses several applications of computational topology to the visualization
of scalar fields. Scalar field data come from different measurements and simulations. The
intrinsic properties of this kind of data, which make the visualization of it to a complicated
task, are the large size and presence of noise. Computational topology is a powerful tool
for automatic feature extraction, which allows the user to interpret the information contained
in the dataset in a more efficient way. Utilizing it one can make the main purpose of
scientific visualization, namely extracting knowledge from data, a more convenient task.
Volume rendering is a class of methods designed for realistic visual representation of 3D
scalar fields. It is used in a wide range of applications with different data size, noise
rate and requirements on interactivity and flexibility. At the moment there is no known
technique which can meet the needs of every application domain, therefore development
of methods solving specific problems is required. One of such algorithms, designed for
rendering of noisy data with high frequencies is presented in the first part of this thesis.
The method works with multidimensional transfer functions and is especially suited for
functions exhibiting sharp features. Compared with known methods the presented algorithm
achieves better visual quality with a faster performance in presence of mentioned
features. An improvement on the method utilizing a topological theory, Morse theory, and
a topological construct, Morse-Smale complex, is also presented in this part of the thesis.
The improvement allows for performance speedup at a little precomputation and memory
cost.
The usage of topological methods for feature extraction on a real world dataset often
results in a very large feature space which easily leads to information overflow. Topology
simplification is designed to reduce the number of features and allow a domain expert
to concentrate on the most important ones. In the terms of Morse theory features are
represented by critical points. An importance measure which is usually used for removing
critical points is called homological persistence. Critical points are cancelled pairwise
according to their homological persistence value. In the presence of outlier-like noise
homological persistence has a clear drawback: the outliers get a high importance value
assigned and therefore are not being removed. In the second part of this thesis a new
importance measure is presented which is especially suited for data with outliers. This
importance measure is called scale space persistence. The algorithm for the computation
of this measure is based on the scale space theory known from the area of computer
vision. The development of a critical point in scale space gives information about its
spacial extent, therefore outliers can be distinguished from other critical points. The usage
of the presented importance measure is demonstrated on a real world application, crater
identification on a surface of Mars.
The third part of this work presents a system for general interactive topology analysis
and exploration. The development of such a system is motivated by the fact that topological
methods are often considered to be complicated and hard to understand, because
application of topology for visualization requires deep understanding of the mathematical
background behind it. A domain expert exploring the data using topology for feature
extraction needs an intuitive way to manipulate the exploration process. The presented
system is based on an intuitive notion of a scene graph, where the user can choose and
place the component blocks to achieve an individual result. This way the domain expert
can extract more knowledge from given data independent on the application domain. The
tool gives the possibility for calculation and simplification of the underlying topological
structure, Morse-Smale complex, and also the visualization of parts of it. The system also
includes a simple generic query language to acquire different structures of the topological
structure at different levels of hierarchy.
The fourth part of this dissertation is concentrated on an application of computational
geometry for quality assessment of a triangulated surface. Quality assessment of a triangulation
is called surface interrogation and is aimed for revealing intrinsic irregularities
of a surface. Curvature and continuity are the properties required to design a visually
pleasing geometric object. For example, a surface of a manufactured body usually should
be convex without bumps of wiggles. Conventional rendering methods hide the regions
of interest because of smoothing or interpolation. Two new methods which are presented
here: curvature estimation using local fitting with B´ezier patches and computation of reflection
lines for visual representation of continuity, are specially designed for assessment
problems. The examples and comparisons presented in this part of the thesis prove the
benefits of the introduced algorithms. The methods are also well suited for concurrent visualization
of the results from simulation and surface interrogation to reveal the possible
intrinsic relationship between them.
The proliferation of sensors in everyday devices – especially in smartphones – has led to crowd sensing becoming an important technique in many urban applications ranging from noise pollution mapping or road condition monitoring to tracking the spreading of diseases. However, in order to establish integrated crowd sensing environments on a large scale, some open issues need to be tackled first. On a high level, this thesis concentrates on dealing with two of those key issues: (1) efficiently collecting and processing large amounts of sensor data from smartphones in a scalable manner and (2) extracting abstract data models from those collected data sets thereby enabling the development of complex smart city services based on the extracted knowledge.
Going more into detail, the first main contribution of this thesis is the development of methods and architectures to facilitate simple and efficient deployments, scalability and adaptability of crowd sensing applications in a broad range of scenarios while at the same time enabling the integration of incentivation mechanisms for the participating general public. During an evaluation within a complex, large-scale environment it is shown that real-world deployments of the proposed data recording architecture are in fact feasible. The second major contribution of this thesis is the development of a novel methodology for using the recorded data to extract abstract data models which are representing the inherent core characteristics of the source data correctly. Finally – and in order to bring together the results of the thesis – it is demonstrated how the proposed architecture and the modeling method can be used to implement a complex smart city service by employing a data driven development approach.
The usage of sensors in modern technical systems and consumer products is in a rapid increase. This advancement can be characterized by two major factors, namely, the mass introduction of consumer oriented sensing devices to the market and the sheer amount of sensor data being generated. These characteristics raise subsequent challenges regarding both the consumer sensing devices' reliability and the management and utilization of the generated sensor data. This thesis addresses these challenges through two main contributions. It presents a novel framework that leverages sentiment analysis techniques in order to assess the quality of consumer sensing devices. It also couples semantic technologies with big data technologies to present a new optimized approach for realization and management of semantic sensor data, hence providing a robust means of integration, analysis, and reuse of the generated data. The thesis also presents several applications that show the potential of the contributions in real-life scenarios.
Due to the broad range, growing feature set and fast release pace of new sensor-based products, evaluating these products is very challenging as standard product testing is not practical. As an alternative, an end-to-end aspect-based sentiment summarizer pipeline for evaluation of consumer sensing devices is presented. The pipeline uses product reviews to extract the sentiment at the aspect level and includes several components namely, product name extractor, aspects extractor and a lexicon-based sentiment extractor which handles multiple sentiment analysis challenges such as sentiment shifters, negations, and comparative sentences among others. The proposed summarizer's components generally outperform the state-of-the-art approaches. As a use case, features of the market leading fitness trackers are evaluated and a dynamic visual summarizer is presented to display the evaluation results and to provide personalized product recommendations for potential customers.
The increased usage of sensing devices in the consumer market is accompanied with increased deployment of sensors in various other fields such as industry, agriculture, and energy production systems. This necessitates using efficient and scalable methods for storing and processing of sensor data. Coupling big data technologies with semantic techniques not only helps to achieve the desired storage and processing goals, but also facilitates data integration, data analysis, and the utilization of data in unforeseen future applications through preserving the data generation context. This thesis proposes an efficient and scalable solution for semantification, storage and processing of raw sensor data through ontological modelling of sensor data and a novel encoding scheme that harnesses the split between the statements of the conceptual model of an ontology (TBox) and the individual facts (ABox) along with in-memory processing capabilities of modern big data systems. A sample use case is further introduced where a smartphone is deployed in a transportation bus to collect various sensor data which is then utilized in detecting street anomalies.
In addition to the aforementioned contributions, and to highlight the potential use cases of sensor data publicly available, a recommender system is developed using running route data, used for proximity-based retrieval, to provide personalized suggestions for new routes considering the runner's performance, visual and nature of route preferences.
This thesis aims at enhancing the integration of sensing devices in daily life applications through facilitating the public acquisition of consumer sensing devices. It also aims at achieving better integration and processing of sensor data in order to enable new potential usage scenarios of the raw generated data.
The Symbol Grounding Problem (SGP) is one of the first attempts to proposed a hypothesis about mapping abstract concepts and the real world. For example, the concept "ball" can be represented by an object with a round shape (visual modality) and phonemes /b/ /a/ /l/ (audio modality).
This thesis is inspired by the association learning presented in infant development.
Newborns can associate visual and audio modalities of the same concept that are presented at the same time for vocabulary acquisition task.
The goal of this thesis is to develop a novel framework that combines the constraints of the Symbol Grounding Problem and Neural Networks in a simplified scenario of association learning in infants. The first motivation is that the network output can be considered as numerical symbolic features because the attributes of input samples are already embedded. The second motivation is the association between two samples is predefined before training via the same vectorial representation. This thesis proposes to associate two samples and the vectorial representation during training. Two scenarios are considered: sample pair association and sequence pair association.
Three main contributions are presented in this work.
The first contribution is a novel Symbolic Association Model based on two parallel MLPs.
The association task is defined by learning that two instances that represent one concept.
Moreover, a novel training algorithm is defined by matching the output vectors of the MLPs with a statistical distribution for obtaining the relationship between concepts and vectorial representations.
The second contribution is a novel Symbolic Association Model based on two parallel LSTM networks that are trained on weakly labeled sequences.
The definition of association task is extended to learn that two sequences represent the same series of concepts.
This model uses a training algorithm that is similar to MLP-based approach.
The last contribution is a Classless Association.
The association task is defined by learning based on the relationship of two samples that represents the same unknown concept.
In summary, the contributions of this thesis are to extend Artificial Intelligence and Cognitive Computation research with a new constraint that is cognitive motivated. Moreover, two training algorithms with a new constraint are proposed for two cases: single and sequence associations. Besides, a new training rule with no-labels with promising results is proposed.
Weak memory consistency models capture the outcomes of concurrent
programs that appear in practice and yet cannot be explained by thread
interleavings. Such outcomes pose two major challenges to formal
methods. First, establishing that a memory model satisfies its
intended properties (e.g., supports a certain compilation scheme) is
extremely error-prone: most proposed language models were initially
broken and required multiple iterations to achieve soundness. Second,
weak memory models make verification of concurrent programs much
harder, as a result of which there are no scalable verification
techniques beyond a few that target very simple models.
This thesis presents solutions to both of these problems.
First, it shows that the relevant metatheory of weak memory
models can be effectively decided (sparing years of manual proof
efforts), and presents Kater, a tool that can answer metatheoretic
queries in a matter of seconds. Second, it presents GenMC, the first
(and only) scalable stateless model checker that is parametric in the
choice of the memory model, often improving the prior state of the art
by orders of magnitude.
The main goal of this thesis is twofold. First, the thesis aims at bridging the gap between existing Pattern Recognition (PR) methods of automatic signature verification and the requirements for their application in forensic science. This gap, attributed by various factors ranging from system definition to evaluation, prevents automatic methods from being used by Forensic Handwriting Examiners (FHEs). Second, the thesis presents novel signature verification methods developed particularly considering the implications of forensic casework, and outperforming the state-of-the-art PR methods.
The first goal of the thesis is attributed by four important factors, i.e., data, terminology, output reporting, and how evaluation of automatic systems is carried out today. It is argued that traditionally the signature data used in PR are not actual/close representative of the real world data (especially that available in forensic cases). The systems trained on such data are, therefore, not suitable for forensic environments. This situation can be tackled by providing more realistic data to PR researchers. To this end, various signature and handwriting datasets are gathered in collaboration with FHEs and are made publicly available through the course of this thesis. A special attention is given to disguised signatures--where authentic authors purposefully make their signatures look like a forgery. This genre was at large neglected in PR research previously.
The terminology used, in the two communities - PR and FHEs, differ greatly. In fact, even in PR, there is no standard terminology and people often differ in the usage of various terms particularly related to various types of forged signatures/handwriting. The thesis presents a new terminology that is equally useful for both forensic scientists and PR researchers. The proposed terminology is hoped to increase the general acceptability of automatic signature analysis systems in forensic science.
The outputs reported by general signature verification systems are not acceptable for FHEs and courts as they are either binary (yes/no) or score (raw evidence) based on similarity/difference. The thesis describes that automatic systems should rather report the probability of observing the evidence (e.g., a certain similarity/difference score) given the signature belongs to the acclaimed identity, and the probability of observing the same evidence given the signature does not belong to the acclaimed identity. This will take automatic systems from hard decisions to soft decisions, thereby enabling them to report likelihood ratios that actually represent the evidential value of the score rather than the raw score (evidence).
When automatic systems report soft decisions (as in the form of likelihood ratios), the thesis argues that there must be some methods to evaluate such systems. This thesis presents one such adaptation. The thesis argues that the state-of-the-art evaluation methods, like equal error rate and area under curve, do not address the needs of forensic science. These needs require an assessment of the evidential value of signature verification, rather than a hard/pure classification (accept/reject binary decision). The thesis demonstrates and validates a relatively simple adaptation of the current verification methods based on the Bayesian inference dependent calibration of continuous scores rather than hard classifications (binary and/or score based classification).
The second goal of this thesis is to introduce various local features based techniques which are capable of performing signature verification in forensic cases and reporting results as anticipated by FHEs and courts. This is an important contribution of the thesis because of the following two reasons. First, to the best of author's knowledge, local feature descriptors are for the first time used for development of signature verification systems for forensic environments (particularly considering disguised signatures). Previously, such methods have been heavily used for recognition tasks, rather than verification of writing behaviors, such as character and digit recognition. Second, the proposed methods not only report the more traditional decisions (like scores-usually reported in PR) but also the Bayesian inference based likelihood ratios (suitable for courts and forensic cases).
Furthermore, the thesis also provides a detailed man vs. machine comparison for signature verification tasks. The men, in this comparison, are forensic scientists serving as forensic handwriting examiners and having experience of varying number of years. The machines are the local features based methods proposed in this thesis, along with various other state-of-the-art signature verification systems. The proposed methods clearly outperform the state-of-the-art systems, and sometimes the human experts.
Finally, the thesis details various tasks that have been performed in the areas closely related to signature verification and its application in forensic casework. These include, developing novel local feature based methods for extraction of signatures/handwritten text from document images, hyper-spectral image analysis for extraction of signatures from forensic documents, and analysis of on-line signatures acquired through specialized pens equipped with Accelerometer and Gyroscope. These tasks are important as they enable the thesis to take PR systems one step further close to direct application in forensic cases.
In recent years, the formal methods community has made significant progress towards the development of industrial-strength static analysis tools that can check properties of real-world production code. Such tools can help developers detect potential bugs and security vulnerabilities in critical software before deployment. While the potential benefits of static analysis tools are clear, their usability and effectiveness in mainstream software development workflows often comes into question and can prevent software developers from using these tools to their full potential. In this dissertation, we focus on two major challenges that can limit their ability to be incorporated into software development workflows.
The first challenge is unintentional unsoundness. Static program analyzers are complicated tools, implementing sophisticated algorithms and performance heuristics. This makes them highly susceptible to undetected unintentional soundness issues. These issues in program analyzers can cause false negatives and have disastrous consequences e.g., when analyzing safety critical software. In this dissertation, we present novel techniques to detect unintentional unsoundness bugs in two foundational program analysis tools namely SMT solvers and Datalog engines. These tools are used extensively by the formal methods community, for instance, in software verification, systematic testing, and program synthesis. We implemented these techniques as easy-to-use open source tools that are publicly available on Github. With the proposed techniques, we were able to detect more than 55 unique and confirmed critical soundness bugs in popular and widely used SMT solvers and Datalog engines in only a few months of testing.
The second challenge is finding the right balance between soundness, precision, and perfor- mance. In an ideal world, a static analyzer should be as precise as possible while maintaining soundness and being sufficiently fast. However, to overcome undecidability issues, these tools have to employ a variety of techniques to be practical for example, compromising on the sound- ness of the analysis or approximating code behavior. Static analyzers therefore are not trivial to integrate into any usage scenario with different program sizes, resource constraints and SLAs. Most of the times, these tools also don’t scale to large industrial code bases containing millions of lines of code. This makes it extremely challenging to get the most out of these analyzers and integrate them into everyday development activities, especially for average software develop- ment teams with little to no knowledge or understanding of advanced static analysis techniques. In this dissertation we present an approach to automatically tailor an abstract interpreter to the code under analysis and any given resource constraints. We implemented our technique as an open source framework, which is publicly available on Github. The second contribution of this dissertation in this challenge area is a technique to horizontally scale analysis tools in cloud-based static analysis platforms by splitting the input to the analyzer into partitions and analyzing the partitions independently. The technique was developed in collaboration with Amazon Web Services and is now being used in production in their CodeGuru service.
Undocumented enterprise data can easily pile up in companies in form of datasets and personal information. In absence of a data management strategy, such data becomes rather messy and may not fit for its intended use. Since there is often no documentation available, only a limited number of domain experts are aware of its contents. Therefore, for companies it becomes increasingly difficult to use such data to its full potential. To provide a solution, this PhD thesis investigates the construction of enterprise and personal knowledge graphs by semantically enriching messy data with meaning using semantic technologies. Since real world entities and their interrelations are organized in a graph, knowledge graphs serve as a semantic bridge between domain conceptualization and raw data. Spreadsheets are a prominent example of such enterprise data, since they are widely used by knowledge workers in the industrial sector. Two distinct approaches are investigated to construct knowledge graphs from them: a global extraction & annotation method and a local mapping technique. The latter is further complemented with a predictor of mapping rules on messy data. Different human-in-the-loop strategies are considered to include experts depending on their user group. Since non-technical users usually lack understanding of semantic technologies, they need appropriate tools to be able to give feedback. In case of developers, approaches are proposed to close the technology gap between industry and Semantic Web related concepts. Semantic Web practitioners participate with ontology modeling and linked data applications. Enterprise and personal data is typically confidential which is why it cannot be shared with a research community to discuss its challenges. However, for evaluation and reproducibility reasons publicly available datasets are mandatory. The thesis proposes ways to generate synthetic datasets with the goal to be as authentic as possible. Besides that, for internal evaluations a crawler of personal data on desktops is implemented. There are further contributions related to this thesis in diverse domains. One is about the motivation to support users in their daily work using personal knowledge assistants. Others are the agricultural field and the data science domain which also benefit from knowledge graph approaches. In conclusion, this PhD thesis contributes to the construction of knowledge graphs from especially messy enterprise data, while users from different groups take part in this process in various ways.
In recent years, deep learning has made substantial improvements in various fields like image understanding, Natural Language Processing (NLP), etc. These huge advancements have led to the release of many commercial applications which aim to help users carry out their daily tasks. Personal digital assistants are one such successful application of NLP, having a diverse userbase from all age groups. NLP tasks like Natural Language Understanding (NLU) and Natural Language Generation (NLG) are core components for building these assistants. However, like any other deep learning model, the growth of NLU & NLG models is directly coupled with tremendous amounts of training examples, which are expensive to collect due to annotator costs. Therefore, this work investigates the methodologies to build NLU and NLG systems in a data-constrained setting.
We evaluate the problem of limited training data in multiple scenarios like limited or no data available when building a new system, availability of a few labeled examples when adding a new feature to an existing system, and changes in the distribution of test data during the lifetime of a deployed system.
Motivated by the standard methods to handle data-constrained settings, we propose novel approaches to generate data and exploit latent representations to overcome performance drops emerging from limited training data.We propose a framework to generate high-quality synthetic data when few training examples are available for a newly added feature for dialogue agents. Our interpretation-to-text model uses existing training data for bootstrapping new features and improves the accuracy of downstream tasks of intent classification and slot labeling. Following, we study a few-shot setting and observe that generation systems face a low semantic coverage problem. Hence, we present an unsupervised NLG algorithm that ensures that all relevant semantic information is present in the generated text.
We also study to see if we really need all training examples for learning a generalized model. We propose a data selection method that selects the most informative training examples to train Visual Question Answering (VQA) models without erosion of accuracy. We leverage the already available inter-annotator agreement and design a diagnostic tool, called (EaSe), that leverages the entropy and semantic similarity of answer patterns.
Finally, we discuss two empirical studies to understand the feature space of VQA models and show how language model pre-training and exploiting multimodal embedding space allows for building data constrained models ensuring minimal or no accuracy losses.
Although today’s bipeds are capable of demonstrating impressive locomotion skills, in many aspects, there’s still a big gap compared to the capabilities observed in humans. Partially, this is due to the deployed control paradigms that are mostly based on analytical approaches. The analytical nature of those approaches entails strong model dependencies – regarding the robotic platform as well as the environment – which makes them prone to unknown disturbances. Recently, an increasing number of biologically-inspired control approaches have been presented from which a human-like bipedal gait emerges. Although the control structures only rely on proprioceptive sensory information, the smoothness of the motions and the robustness against external disturbances is impressive. Due to the lack of suitable robotic platforms, until today the controllers have been mostly applied to
simulations.
Therefore, as the first step towards a suitable platform, this thesis presents the Compliant Robotic Leg (CARL) that features mono- as well as biarticular actuation. The design is driven by a set of core-requirements that is primarily derived from the biologically-inspired behavior-based bipedal locomotion control (B4LC) and complemented by further functional aspects from biomechanical research. Throughout the design process, CARL is understood as a unified dynamic system that emerges from the interplay of the mechanics, the electronics, and the control. Thus, having an explicit control approach and the respective gait in mind, the influence of each subsystem on the characteristics of the overall system is considered
carefully.
The result is a planar robotic leg whose three joints are driven by five highly integrated linear SEAs– three mono- and two biarticular actuators – with minimized reflected inertia. The SEAs are encapsulated by FPGA-based embedded nodes that are designed to meet the hard application requirements while enabling the deployment of a full-featured robotic framework. CARL’s foot is implemented using a COTS prosthetic foot; the sensor information is obtained from the deformation of its main structure. Both subsystems are integrated into a leg structure that matches the proportions of a human with a size of 1.7 m.
The functionality of the subsystems, as well as the overall system, is validated experimentally. In particular, the final experiment demonstrates a coordinated walking motion and thereby confirms that CARL can produce the desired behavior – a natural looking, human-like gait is emerging from the interplay of the behavior-based walking control and the mechatronic system. CARL is robust regarding impacts, the redundant actuation system can render the desired joint torques/impedances, and the foot system supports the walking structurally while it provides the necessary sensory information. Considering that there is no movement of the upper trunk, the angle and torque profiles are comparable to the ones found in humans.
Recent progresses and advances in the field of consumer electronics, driven by display
technologies and also the sector of mobile, hand-held devices, enable new ways in
presenting information to users, as well as new ways of user interaction, therefore
providing a basis for user-centered applications and work environments.
My thesis focuses on how arbitrary display environments can be utilized to improve
both the user experience, regarding perception of information, and also to provide
intuitive interaction possibilities. On the one hand advances in display technologies
provide the basis for new ways of visualizing content and collaborative work, on the
other hand forward-pressing developments in the consumer market, especially the
market of smart phones, offer potential to enhance usability in terms of interaction
and therefore can provide additional benefit for users.
Tiled display setups, combining both large screen real estate and high resolution,
provide new possibilities and chances to visualize large datasets and to facilitate col-
laboration in front of a large screen area. Furthermore these display setups present
several advantages over the traditional single-user-workspace environments: con-
trary to single-user-workspaces, multiple users are able to explore a dataset displayed
on a tiled display system, at the same time, thus allowing new forms of collabora-
tive work. Based on that, face-to-face discussions are enabled, an additional value
is added. Large displays also allow the utilization of the user’s spatial memory, al-
lowing physical navigation without the need of switching between different windows
to explore information.
With Tiled++ I contributed a versatile approach to address the bezel problem. The
bezel problem is one of the Top Ten research challenges in the research field of LCD-
based tiled wall setups. By applying the Tiled++ approach a large high resolution
Focus & Context screen is created, combining high resolution focus areas with low
resolution context information, projected onto the bezel area.
Additionally the field of user interaction poses an important challenge, especially
regarding the utilization of large tiled displays, since traditional keyboard & mouse
interaction devices reached their limits. My focus in this thesis is on Mobile HCI.Devices like mobile phones are utilized to interact with large displays, since they
feature various interaction modalities and preserve user mobility.
Large public displays, as a modernized form of traditional bulletin boards, also en-
able new ways of handling information, displaying content, and user interaction.
Utilized in hot spots, Digital Interactive Public Pinboards can provide an adequate
answer to questions like how to approach pressing issues like disaster and crisis man-
agement for both responders as well as citizens and also new ways of how to handle
information flow (contribution & distribution & accession). My contribution to the
research field of public display environments was the conception and implementa-
tion of an easy-to-use and easy-to-set-up architecture to overcome shortcomings of
current approaches and to cover the needs of aid personnel.
Although being a niche, Virtual Reality (VR) environments can provide additional
value for visualizing specific content. Disciplines like earth sciences & geology, me-
chanical engineering, design, and architecture can benefit from VR environments. In
order to consider the variety of users, I introduce a more intuitive and user friendly
interaction metaphor, the ARC metaphor.
Visualization challenges base on being able to cope with more and more complex
datasets and to bridge the gap between comprehensibility and loss of information.
Furthermore the visualization approach has to be reasonable, which is a crucial
factor when working in interdisciplinary teams, where the standard of knowledge
is diverse. Users have to be able to conceive the visualized content in a fast and
reliable way. My contribution are visualization approaches in the field of supportive
visualization.
Finally, my work illuminates how the synthesis of visualization, interaction and dis-
play technologies enhance the user experience. I promote a holistic view. The user
is brought back into the focus of attention, provided with a tool-set to support him,
without overextending the abilities of, for example, non-expert users, a crucial factor
in the more and more interdisciplinary field of computer science.
An huge amount of computational models and programming languages have been proposed
for the description of embedded systems. In contrast to traditional sequential programming
languages, they cope directly with the requirements for embedded systems: direct support for
concurrent computations and periodic interaction with the environment are only some of the
features they offer. Synchronous languages are one class of languages for the development of
embedded systems and they follow the fundamental principle that the execution is divided into
a sequence of logical steps. Thereby, each step follows the simplification that the computation
of the outputs is finished directly when the inputs are available. This rigorous abstraction leads
to well-defined deterministic parallel composition in general, and to deterministic abortion
and suspension in imperative synchronous languages in particular. These key features also
allow to translate programs to hardware and software, and also formal verification techniques
like model checking can be easily applied.
Besides the advantages of imperative synchronous languages, also some drawbacks can
be listed. Over-synchronization is an effect being caused by parallel threads which have to
synchronize for each execution step, even if they do not communicate, since the synchronization
is implicitly forced by the control-flow. This thesis considers the idea of clock refinement to
introduce several abstraction layers for communication and synchronization in addition to the
existing single-clock abstraction. Thereby, clocks can be refined by several independent clocks
so that a controlled amount of asynchrony between subsequent synchronization points can be
exploited by compilers. The declarations of clocks form a tree, and clocks can be defined within
the threads of the parallel statement, which allows one to do independent computations based
on these clocks without synchronizing the threads. However, the synchronous abstraction is
kept at each level of the abstraction.
Clock refinement is introduced in this thesis as an extension to the imperative synchronous
language Quartz. Therefore, new program statements are introduced which allow to define
a new clock as a refinement of an existing one and to finish a step based on a certain clock.
Examples are considered to show the impact of the behavior of the new statements to
the already existing statements, before the semantics of this extension is formally defined.
Furthermore, the thesis presents a compile algorithm to translate programs to an intermediate
format, and to translate the intermediate format to a hardware description. The advantages
obtained by the new modeling feature are finally evaluated based on examples.
Scaling up conventional processor architectures cannot translate the ever-increasing number of transistors into comparable application performance. Although the trend is to shift from single-core to multi-core architectures, utilizing these multiple cores is not a trivial task for many applications due to thread synchronization and weak memory consistency issues. This is especially true for applications in real-time embedded systems since timing analysis becomes more complicated due to contention on shared resources. One inherent reason for the limited use of instruction-level parallelism (ILP) by conventional processors is the use of registers. Therefore, some recent processors bypass register usage by directly communicating values from producer processing units to consumer processing units. In widely used superscalar processors, this direct instruction communication is organized by hardware at runtime, adversely affecting its scalability. The exposed datapath architectures provide a scalable alternative by allowing compilers to move values directly from output ports to the input ports of processing units. Though exposed datapath architectures have already been studied in great detail, they still use registers for executing programs, thus limiting the amount of ILP they can exploit. This limitation stems from a drawback in their execution paradigm, code generator, or both.
This thesis considers a novel exposed datapath architecture named Synchronous Control Asynchronous Dataflow (SCAD) that follows a hybrid control-flow dataflow execution paradigm. The SCAD architecture employs first-in-first-out (FIFO) buffers at the output and input ports of processing units. It is programmed by move instructions that transport values from the head of output buffers to the tail of input buffers. Thus, direct instruction communication is facilitated by the architecture. The processing unit triggers the execution of an operation when operand values are available at the heads of its input buffers. We propose a code generation technique for SCAD processors inspired by classical queue machines that completely eliminates the use of registers. On this basis, we first generate optimal code by using satisfiability (SAT) solvers after establishing that optimal code generation is hard. Heuristics based on a novel buffer interference analysis are then developed to compile larger programs. The experimental results demonstrate the efficacy of the execution paradigm of SCAD using our queue-oriented code generation technique.
Large-scale distributed systems consist of a number of components, take a number of parameter values as input, and behave differently based on a number of non-deterministic events. All these features—components, parameter values, and events—interact in complicated ways, and unanticipated interactions may lead to bugs. Empirically, many bugs in these systems are caused by interactions of only a small number of features. In certain cases, it may be possible to test all interactions of \(k\) features for a small constant \(k\) by executing a family of tests that is exponentially or even doubly-exponentially smaller than the family of all tests. Thus, in such cases we can effectively uncover all bugs that require up to \(k\)-wise interactions of features.
In this thesis we study two occurrences of this phenomenon. First, many bugs in distributed systems are caused by network partition faults. In most cases these bugs occur due to two or three key nodes, such as leaders or replicas, not being able to communicate, or because the leading node finds itself in a block of the partition without quorum. Second, bugs may occur due to unexpected schedules (interleavings) of concurrent events—concurrent exchange of messages and concurrent access to shared resources. Again, many bugs depend only on the relative ordering of a small number of events. We call the smallest number of events whose ordering causes a bug the depth of the bug. We show that in both testing scenarios we can effectively uncover bugs involving small number of nodes or bugs of small depth by executing small families of tests.
We phrase both testing scenarios in terms of an abstract framework of tests, testing goals, and goal coverage. Sets of tests that cover all testing goals are called covering families. We give a general construction that shows that whenever a random test covers a fixed goal with sufficiently high probability, a small randomly chosen set of tests is a covering family with high probability. We then introduce concrete coverage notions relating to network partition faults and bugs of small depth. In case of network partition faults, we show that for the introduced coverage notions we can find a lower bound on the probability that a random test covers a given goal. Our general construction then yields a randomized testing procedure that achieves full coverage—and hence, find bugs—quickly.
In case of coverage notions related to bugs of small depth, if the events in the program form a non-trivial partial order, our general construction may give a suboptimal bound. Thus, we study other ways of constructing covering families. We show that if the events in a concurrent program are partially ordered as a tree, we can explicitly construct a covering family of small size: for balanced trees, our construction is polylogarithmic in the number of events. For the case when the partial order of events does not have a "nice" structure, and the events and their relation to previous events are revealed while the program is running, we give an online construction of covering families. Based on the construction, we develop a randomized scheduler called PCTCP that uniformly samples schedules from a covering family and has a rigorous guarantee of finding bugs of small depth. We experiment with an implementation of PCTCP on two real-world distributed systems—Zookeeper and Cassandra—and show that it can effectively find bugs.
Comparative Uncertainty Visualization for High-Level Analysis of Scalar- and Vector-Valued Ensembles
(2022)
With this thesis, I contribute to the research field of uncertainty visualization, considering parameter dependencies in multi valued fields and the uncertainty of automated data analysis. Like uncertainty visualization in general, both of these fields are becoming more and more important due to increasing computational power, growing importance and availability of complex models and collected data, and progress in artificial intelligence. I contribute in the following application areas:
Uncertain Topology of Scalar Field Ensembles.
The generalization of topology-based visualizations to multi valued data involves many challenges. An example is the comparative visualization of multiple contour trees, complicated by the random nature of prevalent contour tree layout algorithms. I present a novel approach for the comparative visualization of contour trees - the Fuzzy Contour Tree.
Uncertain Topological Features in Time-Dependent Scalar Fields.
Tracking features in time-dependent scalar fields is an active field of research, where most approaches rely on the comparison of consecutive time steps. I created a more holistic visualization for time-varying scalar field topology by adapting Fuzzy Contour Trees to the time-dependent setting.
Uncertain Trajectories in Vector Field Ensembles.
Visitation maps are an intuitive and well-known visualization of uncertain trajectories in vector field ensembles. For large ensembles, visitation maps are not applicable, or only with extensive time requirements. I developed Visitation Graphs, a new representation and data reduction method for vector field ensembles that can be calculated in situ and is an optimal basis for the efficient generation of visitation maps. This is accomplished by bringing forward calculation times to the pre-processing.
Visually Supported Anomaly Detection in Cyber Security.
Numerous cyber attacks and the increasing complexity of networks and their protection necessitate the application of automated data analysis in cyber security. Due to uncertainty in automated anomaly detection, the results need to be communicated to analysts to ensure appropriate reactions. I introduce a visualization system combining device readings and anomaly detection results: the Security in Process System. To further support analysts I developed an application agnostic framework that supports the integration of knowledge assistance and applied it to the Security in Process System. I present this Knowledge Rocks Framework, its application and the results of evaluations for both, the original and the knowledge assisted Security in Process System. For all presented systems, I provide implementation details, illustrations and applications.
In recent years, recommender systems have been widely used for a variety of different kinds of items such as books, movies, and music. However, current recommendation approaches have often been criticized to suffer from overspecialization thus not enough considering a user’s diverse topics of interest. In this thesis we present a novel approach to extracting contextualized user profiles which enable recommendations taking into account a user’s full range of interests. The method applies algorithms from the domain of topic detection and tracking to automatically identify diverse user interests and to represent them with descriptive labels. That way manual annotations of interest topics by the users, e. g., from a predefined domain taxonomy, are no longer required. The approach has been tested in two scenarios: First, we implemented a content-based recommender system for an Enterprise 2.0 resource sharing platform where the contextualized user interest profiles have been used to generate recommendations with a high degree of inter-topic diversity. In an effort to harness the collective intelligence of the users, the resources in the system were described by making use of user-generated metadata. The evaluation experiments show that our approach is likely to capture a multitude of diverse interest topics per user. The labels extracted are specific for these topics and can be used to retrieve relevant on-topic resources. Second, a slightly adapted variation of the algorithm has been used to target music recommendations based on the user’s current mood. In this scenario music artists are described by using freely available Semantic Web data from the Linked Open Data cloud thus not requiring expensive metadata annotations by experts. The evaluation experiments conducted show that many users have a multitude of different preferred music styles. However a correlation between these music styles and music mood categories could not be observed. An integration of our proposed user profiles with existing user model ontologies seems promising for enabling context-sensitive recommendations.
Under the notion of Cyber-Physical Systems an increasingly important research area has
evolved with the aim of improving the connectivity and interoperability of previously
separate system functions. Today, the advanced networking and processing capabilities
of embedded systems make it possible to establish strongly distributed, heterogeneous
systems of systems. In such configurations, the system boundary does not necessarily
end with the hardware, but can also take into account the wider context such as people
and environmental factors. In addition to being open and adaptive to other networked
systems at integration time, such systems need to be able to adapt themselves in accordance
with dynamic changes in their application environments. Considering that many
of the potential application domains are inherently safety-critical, it has to be ensured
that the necessary modifications in the individual system behavior are safe. However,
currently available state-of-the-practice and state-of-the-art approaches for safety assurance
and certification are not applicable to this context.
To provide a feasible solution approach, this thesis introduces a framework that allows
“just-in-time” safety certification for the dynamic adaptation behavior of networked
systems. Dynamic safety contracts (DSCs) are presented as the core solution concept
for monitoring and synthesis of decentralized safety knowledge. Ultimately, this opens
up a path towards standardized service provision concepts as a set of safety-related runtime
evidences. DSCs enable the modular specification of relevant safety features in
networked applications as a series of formalized demand-guarantee dependencies. The
specified safety features can be hierarchically integrated and linked to an interpretation
level for accessing the scope of possible safe behavioral adaptations. In this way, the networked
adaptation behavior can be conditionally certified with respect to the fulfilled
DSC safety features during operation. As long as the continuous evaluation process
provides safe adaptation behavior for a networked application context, safety can be
guaranteed for a networked system mode at runtime. Significant safety-related changes
in the application context, however, can lead to situations in which no safe adaptation
behavior is available for the current system state. In such cases, the remaining DSC
guarantees can be utilized to determine optimal degradation concepts for the dynamic
applications.
For the operationalization of the DSCs approach, suitable specification elements and
mechanisms have been defined. Based on a dedicated GUI-engineering framework it is
shown how DSCs can be systematically developed and transformed into appropriate runtime
representations. Furthermore, a safety engineering backbone is outlined to support
the DSC modeling process in concrete application scenarios. The conducted validation
activities show the feasibility and adequacy of the proposed DSCs approach. In parallel,
limitations and areas of future improvement are pointed out.
In robotics, information is often regarded as a means to an end. The question of how to structure information and how to bridge the semantic gap between different levels of abstraction in a uniform way is still widely regarded as a technical issue. Ignoring these challenges appears to lead robotics into a similar stasis as experienced in the software industry of the late 1960s. From the beginning of the software crisis until today, numerous methods, techniques, and tools for managing the increasing complexity of software systems have evolved. The attempt to transfer several of these ideas towards applications in robotics yielded various control architectures, frameworks, and process models. These attempts mainly provide modularisation schemata which suggest how to decompose a complex system into less complex subsystems. The schematisation of representation and information flow however is mostly ignored. In this work, a set of design schemata is proposed which is embedded into an action/perception-oriented design methodology to promote thorough abstractions between distinct levels of control. Action-oriented design decomposes control systems top-down and sensor data is extracted from the environment as required. This comes with the problem that information is often condensed in a premature fashion. That way, sensor processing is dependent on the control system design resulting in a monolithical system structure with limited options for reusability. In contrast, perception-oriented design constructs control systems bottom-up starting with the extraction of environment information from sensor data. The extracted entities are placed into structures which evolve with the development of the sensor processing algorithms. In consequence, the control system is strictly dependent on the sensor processing algorithms which again results in a monolithic system. In their particular domain, both design approaches have great advantages but fail to create inherently modular systems. The design approach proposed in this work combines the strengths of action orientation and perception orientation into one coherent methodology without inheriting their weaknesses. More precisely, design schemata for representation, translation, and fusion of environmental information are developed which establish thorough abstraction mechanisms between components. The explicit introduction of abstractions particularly supports extensibility and scalability of robot control systems by design.
Shared memory concurrency is the pervasive programming model for multicore architectures
such as x86, Power, and ARM. Depending on the memory organization, each architecture follows
a somewhat different shared memory model. All these models, however, have one common
feature: they allow certain outcomes for concurrent programs that cannot be explained
by interleaving execution. In addition to the complexity due to architectures, compilers like
GCC and LLVM perform various program transformations, which also affect the outcomes of
concurrent programs.
To be able to program these systems correctly and effectively, it is important to define a
formal language-level concurrency model. For efficiency, it is important that the model is
weak enough to allow various compiler optimizations on shared memory accesses as well
as efficient mappings to the architectures. For programmability, the model should be strong
enough to disallow bogus “out-of-thin-air” executions and provide strong guarantees for well-synchronized
programs. Because of these conflicting requirements, defining such a formal
model is very difficult. This is why, despite years of research, major programming languages
such as C/C++ and Java do not yet have completely adequate formal models defining their
concurrency semantics.
In this thesis, we address this challenge and develop a formal concurrency model that is very
good both in terms of compilation efficiency and of programmability. Unlike most previous
approaches, which were defined either operationally or axiomatically on single executions,
our formal model is based on event structures, which represents multiple program executions,
and thus gives us more structure to define the semantics of concurrency.
In more detail, our formalization has two variants: the weaker version, WEAKEST, and the
stronger version, WEAKESTMO. The WEAKEST model simulates the promising semantics proposed
by Kang et al., while WEAKESTMO is incomparable to the promising semantics. Moreover,
WEAKESTMO discards certain questionable behaviors allowed by the promising semantics.
We show that the proposed WEAKESTMO model resolve out-of-thin-air problem, provide
standard data-race-freedom (DRF) guarantees, allow the desirable optimizations, and can be
mapped to the architectures like x86, PowerPC, and ARMv7. Additionally, our models are
flexible enough to leverage existing results from the literature to establish data-race-freedom
(DRF) guarantees and correctness of compilation.
In addition, in order to ensure the correctness of compilation by a major compiler, we developed
a translation validator targeting LLVM’s “opt” transformations of concurrent C/C++
programs. Using the validator, we identified a few subtle compilation bugs, which were reported
and were fixed. Additionally, we observe that LLVM concurrency semantics differs
from that of C11; there are transformations which are justified in C11 but not in LLVM and
vice versa. Considering the subtle aspects of LLVM concurrency, we formalized a fragment
of LLVM’s concurrency semantics and integrated it into our WEAKESTMO model.
This PhD thesis aims at finding a global robot navigation strategy for rugged off-road terrain which is robust against inaccurate self-localization, scalable to large environments, but also cost-efficient, e.g. able to generate navigation paths which optimize a cost measure closely related to terrain traversability. In order to meet this goal, aspects of both metrical and topological navigation techniques are combined. A primarily topological map is extended with the previously lacking capability of cost-efficient path planning and map extension. Further innovations include a multi-dimensional cost measure for topological edges, a method to learn these costs based on live feedback from the robot and a set of extrapolation methods to predict the traversability costs for untraversed edges. The thesis presents two sophisticated new image analysis techniques to optimize cost prediction based on the shape and appearance of surrounding terrain. Experimental results indicate that the proposed global navigation system is indeed able to perform cost-efficient, large scale path planning. At the same time, the need to maintain a fine-grained, global world model which would reduce the scalability of the approach is avoided.
Data-driven and Sparse-to-Dense Concepts in Scene Flow Estimation for Automotive Applications
(2022)
Highly assisted driving and autonomous vehicles require a detailed and accurate perception of the environment. This includes the perception of the 3D geometry of the scene and the 3D motion of other road users. The estimation of both based on images is known as the scene flow problem in computer vision. This thesis deals with a solution to the scene flow problem that is suitable for application in autonomous vehicles. This application imposes strict requirements on accuracy, robustness, and speed. Previous work was lagging behind in at least one of these metrics. To work towards the fulfillment of those requirements, the sparse-to-dense concept for scene flow estimation is introduced in this thesis. The idea can be summarized as follows: First, scene flow is estimated for some points of the scene for which this can be done comparatively easily and reliably. Then, an interpolation is performed to obtain a dense estimate for the entire scene. Because of the separation into two steps, each part can be optimized individually. In a series of experiments, it is shown that the proposed methods achieve competitive results and are preferable to previous techniques in some aspects. As a second contribution, individual components in the sparse-to-dense pipeline are replaced by deep learning modules. These are a highly localized and highly accurate feature descriptor to represent pixels for dense matching, and a network for robust and generic sparse-to-dense interpolation. Compared to end-to-end architectures, the advantage of deep modules is that they can be trained more effciently with data from different domains. The recombination approach applies a similar concept as the sparse-to-dense approach by solving and combining less diffcult, auxiliary sub-problems. 3D geometry and 2D motion are estimated separately, the individual results are combined, and then also interpolated into a dense scene flow. As a final contribution, the thesis proposes a set of monolithic end-to-end networks for scene flow estimation.
Dealing with Dependence in the End-to-End Performance Analysis in Stochastic Network Calculus
(2022)
Communication networks, in particular the Internet, have become a pivotal part of our life. Since their beginnings, a key aspect of their applicability has been the performance. Safety-critical applications, for example, can sometimes only be implemented in a responsible manner if guarantees about their end-to-end delay can be made. A mathematical modeling and performance evaluation of communication networks requires a powerful set of tools that is able to incorporate their increasing complexity.
The stochastic network calculus (SNC) is a versatile, mathematical framework that allows for a calculation of probabilistic end-to-end performance bounds of distributed systems. Its flexibility enables to incorporate a large class of different schedulers as well as different models of traffic processes beyond the assumption of Poisson arrivals that is predominant in queueing theory-based analyses. It originates in the so-called deterministic network analysis (DNC) in the 90's of the 20th century that was introduced to provide deterministic, ``hard'' guarantees that are of relevance, e.g., in the context of real-time systems. While the DNC of today can be used to calculate fast and accurate delay bounds of arbitrary feedforward networks, the SNC is still in a significantly earlier stage. In particular, method-pertinent dependencies, i.e., a phenomenon that occurs when independent flows become stochastically dependent after sharing resources in the network, can be considered a major challenge in the SNC with moment-generating functions (MGFs).
This thesis argues to contribute to the SNC in several ways. First, we show that the ``pay multiplexing only once'' (PMOO) analysis known from the DNC is also possible in the SNC. Not only does it significantly improve end-to-end delay bounds, it also needs to consider less method-pertinent dependencies. Therefore, complexity and runtimes of the calculation are greatly reduced. Second, we introduce the concept of negative dependence to the SNC with MGFs and give numerical evidence that this can further lead to better performance bounds. Third, for the larger problem of end-to-end performance bounds of tree networks, we introduce so-called ''h-mitigators'', a modification in the calculation of MGF output bounds. It is minimally invasive, all existing results and procedures are still applicable, and improves performance bounds. As a fourth contribution, we conduct extensive numerical evaluations to substantiate our claims. Moreover, we made the respective code, the ''SNC MGF toolbox'', publicly available to ensure that the results are reproducible. At last, we conduct different stochastic analyses of a popular fair scheduler, generalized processor sharing (GPS). We give an overview of the state-of-the-art analyses in the SNC and substantiate the comparison through numerical evaluations.
Optical Character Recognition (OCR) is one of the central problems in pattern recognition. Its
applications have played a great role in the digitization of document images collected from het-
erogeneous sources. Many of the well-known scripts have OCR systems with sufficiently high
performance that enables OCR applications in industrial/commercial settings. However, OCR sys-
tems yield very-good results only on a narrow domain and very-specific use cases. Thus, it is still
a challenging task, and there are other exotic languages with indigenous scripts, such as Amharic,
for which no well-developed OCR systems exist.
As many as 100 million people speak Amharic, and it is an official working language of Ethiopia.
Amharic script contains about 317 different alphabets derived from 34 consonants with small changes.
The change involves shortening or elongating one of its main legs or adding small diacritics to the
right, left, top, or bottom of the consonant character. Such modifications lead the characters to have
similar shapes and make the recognition task complex, but this is particularly interesting for charac-
ter recognition research. So far, Amharic script recognition models are developed based on classical
machine learning techniques, and they are very limited in addressing the issues for Amharic OCR.
The motivation of this thesis is, therefore, to explore and tailor contemporary deep learning tech-
niques for the OCR of Amharic.
This thesis addresses the challenges in Amharic OCR through two main contributions. The first
contribution is an algorithmic contribution in which we investigate deep learning approaches that
suit the demand for Amharic OCR. The second is a technical contribution that comprises several
works towards the OCR model development; thus, it introduces a new Amharic database consisting
of collections of images annotated at a character and text-line level. It also presents a novel CNN-
based framework designed by leveraging the grapheme of characters in Fidel-Gebeta (where Fidel-
Gebeta consists of the full set of Amharic characters in matrix structure) and achieves 94.97%
overall character recognition accuracy.
In addition to character level methods, text-line level methods are also investigated and devel-
oped based on sequence-to-sequence learning. These models avoid several of the pre-processing
stages used in prior works by eliminating the need to segment individual characters. In this design,
we use a stack of CNNs, before the Bi-LSTM layers and train from end-to-end. This model out-
performs the LSTM-CTC based network, on average, by a CER of 3.75% with the ADOCR test
set. Motivated by the success of attention, in addressing the problems’ of long sequences in Neural
Machine Translation (NMT), we proposed a novel attention-based methodology by blending the
attention mechanism into CTC objective function. This model performs far better than the existing
techniques with a CER of 1.04% and 0.93% on printed and synthetic text-line images respectively.
Finally, this thesis provides details on various tasks that have been performed for the development
of Amharic OCR. As per our empirical analysis, the majority of the errors are due to poor annotation
of the dataset. As future work, the methods proposed in this thesis should be further investigated
and extended to deal with handwritten and historical Amharic documents.
3D hand pose and shape estimation from a single depth image is a challenging computer vision and graphics problem with many applications such as
human computer interaction and animation of a personalized hand shape in
augmented reality (AR). This problem is challenging due to several factors
for instance high degrees of freedom, view-point variations and varying hand
shapes. Hybrid approaches based on deep learning followed by model fitting
preserve the structure of hand. However, a pre-calibrated hand model limits
the generalization of these approaches. To address this limitation, we proposed a novel hybrid algorithm for simultaneous estimation of 3D hand pose
and bone-lengths of a hand model which allows training on datasets that contain varying hand shapes. On the other hand, direct joint regression methods
achieve high accuracy but they do not incorporate the structure of hand in
the learning process. Therefore, we introduced a novel structure-aware algorithm which learns to estimate 3D hand pose jointly with new structural constraints. These constraints include fingers lengths, distances of joints along
the kinematic chain and fingers inter-distances. Learning these constraints
help to maintain a structural relation between the estimated joint keypoints.
Previous methods addressed the problem of 3D hand pose estimation. We
open a new research topic and proposed the first deep network which jointly
estimates 3D hand shape and pose from a single depth image. Manually annotating real data for shape is laborious and sub-optimal. Hence, we created a
million-scale synthetic dataset with accurate joint annotations and mesh files
of depth maps. However, the performance of this deep network is restricted by
limited representation capacity of the hand model. Therefore, we proposed a
novel regression-based approach in which the 3D dense hand mesh is recovered
from sparse 3D hand pose, and weak-supervision is provided by a depth image synthesizer. The above mentioned approaches regressed 3D hand meshes
from 2D depth images via 2D convolutional neural networks, which leads to
artefacts in the estimations due to perspective distortions in the images. To
overcome this limitation, we proposed a novel voxel-based deep network with
3D convolutions trained in a weakly-supervised manner. Finally, an interesting
application is presented which is in-air signature acquisition and verification
based on deep hand pose estimation. Experiments showed that depth itself is
an important feature, which is sufficient for verification.
In recent years, Augmented Reality has made its way into everyday devices. Most smartphones are AR-enabled, providing applications like pedestrian navigation, Point of Interest highlighting, gaming, and retail. The high-tech industry has been focused on developing smartglasses to present virtual elements directly in front of the viewers’ eyes, allowing more immersive AR experiences. Smartglasses can also be deployed while driving for an enhanced and more safe experience. A 3D registered augmentation of the real world with navigation arrows, lane highlighting, or warnings can decrease the duration of inattentiveness regarding driving due to glancing at other screens. Enabling HMDs’ usage inside cars requires knowing its exact position and orientation (6-DoF pose) in the car. This necessitates sensors either built inside the AR glasses or the car. In a car, the latter option called outside-in tracking is more attractive due to two reasons. First, AR glasses containing different sensor sets exist, hampering finding one single solution for different HMDs. Second, the view from the driver’s perspective combines static interior and dynamic exterior features, complicating finding a reliable set of features. Nowadays, tracking methods utilize Deep Learning for a more generalizable and accurate derivation of the 6-DoF pose. They achieve outstanding results for head and object pose estimation. In this thesis, we present Deep Learning-based in-car 6-DoF AR glasses pose estimation approaches. The goal of the work is an exploration of accurate HMD pose estimation with the help of neural networks. The thesis achieves this by investigating numerous pose estimation techniques. Evaluations on the recorded HMDPose dataset constitute the foundation for this, consisting of infrared images of drivers wearing different HMD models. First, algorithms based on images are derived and evaluated on the dataset. For comparison, we carried out an evaluation on image-based methods considering time information. Further, pose estimation based on point clouds, generated out of infrared images, are analyzed. An investigation of various head pose estimation methods to derive its potential use are conducted. In conclusion, we introduce several highly accurate AR glasses pose estimators. The HMD pose alone achieves better results than the head pose and the combination of the head and HMD. Especially our image-based methods with optional usage of time information can efficiently and accurately regress the AR glasses pose. Our algorithms show excellent estimation results on live data when deployed inside a car, making seamless in-car HMD usage possible in the future.
Faces deliver invaluable information about people. Machine-based perception can be of a great benefit in extracting that underlying information in face images if the problem is properly modeled. Classical image processing algorithms may fail to handle the diverse data available today due to several challenges related to varying capturing locations, and conditions. Advanced machine learning methods and algorithms are now highly beneficial due to the rapid development of powerful hardware, enabling feasible advanced solutions based on data learning and summarization into powerful models. In this thesis, novel solutions are provided to the problems of head orientation estimation and gender prediction. Initially, classical machine learning algorithms were used to address head orientation estimation but were limited by their inability to handle large datasets and poor generalization. To overcome these challenges, a new highly accurate head pose dataset was acquired to tackle the identified problems. Novel trained deep neural networks have been exploited, that use the acquired data and provide novel architectures. The information about head pose is then represented in the network weights, thus, allowing predicting the head orientation angles given a new unseen face. The acquired dataset, named AutoPOSE opens the door for further studies in the field of computer vision and especially, face analysis. The problem of gender prediction has also been explored, but unlike humans who can easily identify gender from a face, computers face difficulties due to facial similarities. Therefore, hand-crafted features are not effective for generalization. To address this, a new deep learning method was developed and evaluated on multiple public datasets, with identified challenges in both still images and videos addressed. Finally, the effect of facial appearance changes due to head orientation variation has been investigated on gender prediction accuracy. A novel orientation-guided feature maps recalibration method is presented, that significantly increased the accuracy of gender prediction.
In conclusion, two problems have been addressed in this thesis, independently and joined together. Existing methods have been enhanced with intelligent pre-processing methods and new approaches have been introduced to tackle existing challenges, that arise from pose, illumination, and occlusion variations. The proposed methods have been extensively evaluated, showing that head orientation and gender prediction can be estimated with high accuracy using machine learning-based methods. Also, the evaluations showed that the use of head orientation information consistently improved the gender prediction accuracy. Scientific contributions have been presented, and the new acquired highly accurate dataset motivates the research community to push the state-of-the-art forward.