### Refine

#### Year of publication

#### Document Type

- Doctoral Thesis (128) (remove)

#### Language

- English (128) (remove)

#### Keywords

- Visualisierung (13)
- Computergraphik (5)
- Visualization (5)
- Evaluation (3)
- Geoinformationssystem (3)
- Mustererkennung (3)
- Robotik (3)
- Semantic Web (3)
- computer graphics (3)
- document analysis (3)
- optical character recognition (3)
- verification (3)
- visualization (3)
- Algorithmus (2)
- Bildverarbeitung (2)
- Computational Fluid Dynamics (2)
- Datenanalyse (2)
- Datenbank (2)
- Endlicher Automat (2)
- Layout (2)
- Merkmalsextraktion (2)
- Optische Zeichenerkennung (2)
- Raumakustik (2)
- Ray casting (2)
- Recommender Systems (2)
- Room acoustics (2)
- Scattered-Data-Interpolation (2)
- Scientific Visualization (2)
- Uncertainty Visualization (2)
- image processing (2)
- layout analysis (2)
- machine learning (2)
- virtual acoustics (2)
- 3D Gene Expression (1)
- 3D Point Data (1)
- ASM (1)
- AUTOSAR (1)
- Ableitungsschätzung (1)
- Abrechnungsmanagement (1)
- Abstraction (1)
- Abstraktion (1)
- Accounting Agent (1)
- Activity Recognition, Wearable Computing, Minimal Training, Unobtrusive Instrumentations (1)
- Activity recognition (1)
- Adaptive Data Structure (1)
- Affine Arithmetic (1)
- Algorithm (1)
- Algorithmic Differentiation (1)
- Artificial Intelligence (1)
- Association (1)
- Automat <Automatentheorie> (1)
- Automatische Differentiation (1)
- Automatisches Beweisverfahren (1)
- Autonomer Roboter (1)
- Befahrbarkeitsanalyse (1)
- Benutzer (1)
- Beschränkte Arithmetik (1)
- Bioinformatik (1)
- Bluetooth (1)
- Boosting (1)
- Caching (1)
- Carbon footprint (1)
- Classification (1)
- Cluster-Analyse (1)
- Code Generation (1)
- Collaboration (1)
- Combinatorial Testing (1)
- Computer Graphic (1)
- Computer Supported Cooperative Work (1)
- Computer graphics (1)
- Computerphysik (1)
- Computersimulation (1)
- Computertomographie (1)
- Computervision (1)
- Cycle Accuracy (1)
- DCE <Programm> (1)
- DFG (1)
- DPN (1)
- Data Modeling (1)
- Dataset (1)
- Decision Support Systems (1)
- Delaunay-Triangulierung (1)
- Derivative Estimation (1)
- Direct Numerical Simulation (1)
- Discrete Event Simulation (DES) (1)
- Distributed Rendering (1)
- Distributed system (1)
- Dreidimensionale Bildverarbeitung (1)
- Dreidimensionale Rekonstruktion (1)
- Dreidimensionale Strömung (1)
- Duplicate Identification (1)
- Duplikaterkennung (1)
- Effizienter Algorithmus (1)
- Empfehlungssysteme (1)
- Ensemble Visualization (1)
- Entwurf (1)
- Evolutionary Algorithm (1)
- Experiment (1)
- Experimentation (1)
- Extrapolation (1)
- Fahrtkostenmodelle (1)
- Fault Tree Analysis (1)
- Feasibility study (1)
- Feature (1)
- Feature Detection (1)
- Feature Extraction (1)
- Fehlerbaumanalyse (1)
- Fließanalyse (1)
- Flow Visualization (1)
- Formale Grammatik (1)
- Formale Sprache (1)
- Formaler Beweis (1)
- Functional Safety (1)
- Funktionale Sicherheit (1)
- Fusion (1)
- Future Internet (1)
- GPU (1)
- Gateway (1)
- Gauß-Filter (1)
- Gefahren- und Risikoanalyse (1)
- Gene expression programming (1)
- Generierung (1)
- Geovisualization (1)
- Google Earth (1)
- Hazard Analysis (1)
- Hitting families (1)
- Hypergraph (1)
- IEEE 802.15.4 (1)
- IP Address (1)
- IP Traffic Accounting (1)
- IP-XACT (1)
- ISO 26262 (1)
- Image Processing (1)
- Implementierung (1)
- Incremental recomputation (1)
- Induktive logische Programmierung (1)
- Information Extraction (1)
- Information Visualization (1)
- Intensity estimation (1)
- Interaction (1)
- Interaktion (1)
- Intermediate Composition (1)
- Interpolation (1)
- Invariante (1)
- Invariante Momente (1)
- Kellerautomat (1)
- Klassifikation (1)
- Knowledge Management (1)
- Knuth-Bendix completion (1)
- Knuth-Bendix-Vervollständigung (1)
- Kompression (1)
- Korrelationsanalyse (1)
- Kurve (1)
- LIDAR (1)
- LIR-Tree (1)
- Large High-Resolution Displays (1)
- Laser Wakefield Particle Accelerator (1)
- Leistungsmessung (1)
- Linked Data (1)
- Linking Data Analysis and Visualization (1)
- Lokalisierung (1)
- Lärmbelastung (1)
- Lärmimmission (1)
- MAC protocols (1)
- MIP-Emissionsspektroskopie (1)
- MIP-Massenspektrometrie (1)
- Machine Learning (1)
- Machine learning (1)
- Magnetfeldbasierter Lokalisierung (1)
- Magnetfelder (1)
- Manufacturing (1)
- MapReduce (1)
- Maschinelle Übersetzung (1)
- Maximum Intensity Projection (1)
- Mensch-Maschine-Kommunikation (1)
- Menschenmenge (1)
- Merkmalsraum (1)
- Mesh-Free (1)
- Meter (1)
- Mikroskopie (1)
- Minimal Cut Set Visualization (1)
- Mobile system (1)
- Mobiler Roboter (1)
- Model Checking (1)
- Modellierung (1)
- Modularisierung (1)
- Molekulare Bioinformatik (1)
- Moment Invariants (1)
- Mood-based Music Recommendations (1)
- Multi-Edge Graph (1)
- Multi-Field (1)
- Multi-Variate Data (1)
- Multifield Data (1)
- Nachhaltigkeit (1)
- Natural Language Processing (1)
- Natural Neighbor (1)
- Natural Neighbor Interpolation (1)
- Natürliche Nachbarn (1)
- Navigation (1)
- Network Architecture (1)
- Network Calculus (1)
- Netzwerk (1)
- Node-Link Diagram (1)
- Numerische Strömungssimulation (1)
- OCR (1)
- Off-road Robotics (1)
- Off-road Robotik (1)
- Online chain partitioning (1)
- Optimierender Compiler (1)
- Optimierung (1)
- Paralleler Hybrid (1)
- Pareto Optimality (1)
- Partial Differential Equations (1)
- Partially ordered sets (1)
- Participatory Sensing (1)
- Pattern Recognition (1)
- Performance (1)
- Personalisation (1)
- Pervasive health (1)
- Physical activity monitoring (1)
- Process Data (1)
- Protocol Composition (1)
- Prototyp (1)
- Prototype (1)
- Prozessvisualisierung (1)
- Quantitative Bildanalyse (1)
- Quartz (1)
- Quicksort (1)
- Random testing (1)
- Ray tracing (1)
- Rechtecksgitter (1)
- Rectilinear Grid (1)
- Redundanzvermeidung (1)
- Representation (1)
- Risikobewertung (1)
- Risk Assessment (1)
- Roboter (1)
- Räumliche Statistik (1)
- SAHARA (1)
- SDL (1)
- SDL extensions (1)
- SPARQL (1)
- SPARQL query learning (1)
- Safety Analysis (1)
- Scalar (1)
- Schema <Informatik> (1)
- Schematisation (1)
- Schematisierung (1)
- Semantic Wikis (1)
- Semantische Modellierung (1)
- Semantisches Datenmodell (1)
- Shared Resource Modeling (1)
- Sicherheitsanalyse (1)
- Similarity Join (1)
- Similarity Joins (1)
- Simulation (1)
- Skalar (1)
- Smart City (1)
- Smartphone (1)
- Smartwatch (1)
- Socio-Semantic Web (1)
- Software Comprehension (1)
- Software Dependencies (1)
- Software Evolution (1)
- Software Maintenance (1)
- Software Measurement (1)
- Software Testing (1)
- Software Visualization (1)
- Software engineering (1)
- Software-Architektur (1)
- Softwareentwicklung (1)
- Softwaremetrie (1)
- Softwarewartung (1)
- Sprachdefinition (1)
- Sprachprofile (1)
- Stimmungsbasierte Musikempfehlungen (1)
- Stokes Equations (1)
- Strukturiertes Gitter (1)
- Strömung (1)
- Surface Reconstruction (1)
- SystemC (1)
- Systemarchitektur (1)
- Systemdesign (1)
- Temporal Decoupling (1)
- Temporal data processing (1)
- Tensor (1)
- Tensorfeld (1)
- Tesselation (1)
- Tetraeder (1)
- Tetraedergi (1)
- Tetrahedral Grid (1)
- Tetrahedral Mesh (1)
- Themenbasierte Empfehlungen von Ressourcen (1)
- Topic-based Resource Recommendations (1)
- Topologie (1)
- Topology (1)
- Topology visualization (1)
- Transaction Level Modeling (TLM) (1)
- Transport Protocol (1)
- Traversability Analysis (1)
- UML (1)
- Ubiquitous system (1)
- Unorganized Data (1)
- Unstrukturiertes Gitter (1)
- Urban sprawl (1)
- UrbanSim (1)
- User Model (1)
- Validierung (1)
- Vector (1)
- Vector Field (1)
- Vektor (1)
- Vektorfelder (1)
- Verifikation (1)
- Verzerrungstensor (1)
- Visual Analytics (1)
- Visual Queries (1)
- Volume rendering (1)
- Volumen-Rendering (1)
- Voronoi diagram (1)
- Voronoi-Diagramm (1)
- Wearable computing (1)
- WiFi (1)
- Wide-column stores (1)
- XDBMS (1)
- XML (1)
- XML query estimation (1)
- XML summary (1)
- Yaroslavskiy-Bentley-Bloch Quicksort (1)
- acoustic modeling (1)
- affective user interface (1)
- affine arithmetic (1)
- analysis of algorithms (1)
- artificial intelligence (1)
- artificial neural network (1)
- associations (1)
- automated theorem proving (1)
- average-case analysis (1)
- behaviour-based system (1)
- binary countdown protocol (1)
- black bursts (1)
- classification (1)
- collaborative information visualization (1)
- collaborative mobile sensing (1)
- collective intelligence (1)
- computational biology (1)
- computational fluid dynamics (1)
- computer-supported cooperative work (1)
- concurrent (1)
- content-and-structure summary (1)
- continuous master theorem (1)
- crowd condition estimation (1)
- crowd density estimation (1)
- crowd scanning (1)
- crowd sensing (1)
- crowdsourcing (1)
- curves and surfaces (1)
- data sets (1)
- data-flow (1)
- dataset (1)
- decidability (1)
- design (1)
- deterministic arbitration (1)
- directed graphs (1)
- distributed (1)
- distributed real-time systems (1)
- distributed tasks (1)
- embedded (1)
- embedding (1)
- emotion visualization (1)
- end-to-end learning (1)
- environmental noise (1)
- evolutionary algorithm (1)
- firewall (1)
- flow visualization (1)
- foundational translation validation (1)
- gaussian filter (1)
- geographic information systems (1)
- geology (1)
- graph drawing algorithm (1)
- graph embedding (1)
- graph layout (1)
- historical documents (1)
- hypergraph (1)
- iB2C (1)
- image analysis (1)
- implementation (1)
- information systems (1)
- interference (1)
- interval arithmetic (1)
- invariant (1)
- language definition (1)
- language modeling (1)
- language profiles (1)
- linked data (1)
- long short-term memory (1)
- long tail (1)
- machine-checkable proof (1)
- magnetic field based localization (1)
- matrix visualization (1)
- message-passing (1)
- metadata (1)
- mixed-signal (1)
- modularisation (1)
- moment (1)
- multicore (1)
- multidimensional datasets (1)
- multinomial regression (1)
- multithreading (1)
- multitype code coupling (1)
- multiway partitioning (1)
- nestable tangibles (1)
- optimization correctness (1)
- oscillating magnetic fields (1)
- out-of-order (1)
- parallel (1)
- participatory sensing (1)
- path cost models (1)
- pattern recognition (1)
- pivot sampling (1)
- point cloud (1)
- proof generating optimizer (1)
- ray casting (1)
- ray tracing (1)
- real-time tasks (1)
- relaxed memory models (1)
- robustness (1)
- secondary structure prediction (1)
- semantic web (1)
- small-multiples node-link visualization (1)
- social media (1)
- software comprehension (1)
- software engineering task (1)
- spatial statistics (1)
- static software structure (1)
- stationary sensing (1)
- statistics (1)
- structural summary (1)
- subjective evaluation (1)
- symbolic simulation (1)
- synchronous (1)
- system architecture (1)
- tabletop (1)
- task sequence (1)
- tensor (1)
- tensorfield (1)
- terrain rendering (1)
- time-varying flow fields (1)
- touch surfaces (1)
- translation contract (1)
- urban planning (1)
- user-centered design (1)
- vector field visualization (1)
- vectorfield (1)
- virtual reality (1)
- visual structure (1)
- weighted finite-state transducers (1)
- wireless signal (1)
- zeitabhängige Strömungen (1)
- Ähnlichkeit (1)
- Übersetzung (1)

#### Faculty / Organisational entity

- Fachbereich Informatik (128) (remove)

Large-scale distributed systems consist of a number of components, take a number of parameter values as input, and behave differently based on a number of non-deterministic events. All these features—components, parameter values, and events—interact in complicated ways, and unanticipated interactions may lead to bugs. Empirically, many bugs in these systems are caused by interactions of only a small number of features. In certain cases, it may be possible to test all interactions of \(k\) features for a small constant \(k\) by executing a family of tests that is exponentially or even doubly-exponentially smaller than the family of all tests. Thus, in such cases we can effectively uncover all bugs that require up to \(k\)-wise interactions of features.
In this thesis we study two occurrences of this phenomenon. First, many bugs in distributed systems are caused by network partition faults. In most cases these bugs occur due to two or three key nodes, such as leaders or replicas, not being able to communicate, or because the leading node finds itself in a block of the partition without quorum. Second, bugs may occur due to unexpected schedules (interleavings) of concurrent events—concurrent exchange of messages and concurrent access to shared resources. Again, many bugs depend only on the relative ordering of a small number of events. We call the smallest number of events whose ordering causes a bug the depth of the bug. We show that in both testing scenarios we can effectively uncover bugs involving small number of nodes or bugs of small depth by executing small families of tests.
We phrase both testing scenarios in terms of an abstract framework of tests, testing goals, and goal coverage. Sets of tests that cover all testing goals are called covering families. We give a general construction that shows that whenever a random test covers a fixed goal with sufficiently high probability, a small randomly chosen set of tests is a covering family with high probability. We then introduce concrete coverage notions relating to network partition faults and bugs of small depth. In case of network partition faults, we show that for the introduced coverage notions we can find a lower bound on the probability that a random test covers a given goal. Our general construction then yields a randomized testing procedure that achieves full coverage—and hence, find bugs—quickly.
In case of coverage notions related to bugs of small depth, if the events in the program form a non-trivial partial order, our general construction may give a suboptimal bound. Thus, we study other ways of constructing covering families. We show that if the events in a concurrent program are partially ordered as a tree, we can explicitly construct a covering family of small size: for balanced trees, our construction is polylogarithmic in the number of events. For the case when the partial order of events does not have a "nice" structure, and the events and their relation to previous events are revealed while the program is running, we give an online construction of covering families. Based on the construction, we develop a randomized scheduler called PCTCP that uniformly samples schedules from a covering family and has a rigorous guarantee of finding bugs of small depth. We experiment with an implementation of PCTCP on two real-world distributed systems—Zookeeper and Cassandra—and show that it can effectively find bugs.

Ranking lists are an essential methodology to succinctly summarize outstanding items, computed over database tables or crowdsourced in dedicated websites. In this thesis, we propose the usage of automatically generated, entity-centric rankings to discover insights in data. We present PALEO, a framework for data exploration through reverse engineering top-k database queries, that is, given a database and a sample top-k input list, our approach, aims at determining an SQL query that returns results similar to the provided input when executed over the database. The core problem consist of finding selection predicates that return the given items, determining the correct ranking criteria, and evaluating the most promising candidate queries first. PALEO operates on subset of the base data, uses data samples, histograms, descriptive statistics, and further proposes models that assess the suitability of candidate queries which facilitate limitation of false positives. Furthermore, this thesis presents COMPETE, a novel approach that models and computes dominance over user-provided input entities, given a database of top-k rankings. The resulting entities are found superior or inferior with tunable degree of dominance over the input set---a very intuitive, yet insightful way to explore pros and cons of entities of interest. Several notions of dominance are defined which differ in computational complexity and strictness of the dominance concept---yet, interdependent through containment relations. COMPETE is able to pick the most promising approach to satisfy a user request at minimal runtime latency, using a probabilistic model that is estimating the result sizes. The individual flavors of dominance are cast into a stack of algorithms over inverted indices and auxiliary structures, enabling pruning techniques to avoid significant data access over large datasets of rankings.

Wearable activity recognition aims to identify and assess human activities with the help
of computer systems by evaluating signals of sensors which can be attached to the human
body. This provides us with valuable information in several areas: in health care, e.g. fluid
and food intake monitoring; in sports, e.g. training support and monitoring; in entertainment,
e.g. human-computer interface using body movements; in industrial scenarios, e.g.
computer support for detected work tasks. Several challenges exist for wearable activity
recognition: a large number of nonrelevant activities (null class), the evaluation of large
numbers of sensor signals (curse of dimensionality), ambiguity of sensor signals compared
to the activities and finally the high variability of human activity in general.
This thesis develops a new activity recognition strategy, called invariants classification,
which addresses these challenges, especially the variability in human activities. The
core idea is that often even highly variable actions include short, more or less invariant
sub-actions which are due to hard physical constraints. If someone opens a door, the
movement of the hand to the door handle is not fixed. However the door handle has to
be pushed to open the door. The invariants classification algorithm is structured in four
phases: segmentation, invariant identification, classification, and spotting. The segmentation
divides the continuous sensor data stream into meaningful parts, which are related
to sub-activities. Our segmentation strategy uses the zero crossings of the central difference
quotient of the sensor signals, as segment borders. The invariant identification finds
the invariant sub-activities by means of clustering and a selection strategy dependent on
certain features. The classification identifies the segments of a specific activity class, using
models generated from the invariant sub-activities. The models include the invariant
sub-activity signal and features calculated on sensor signals related to the sub-activity. In
the spotting, the classified segments are used to find the entire activity class instances in
the continuous sensor data stream. For this purpose, we use the position of the invariant
sub-activity in the related activity class instance for the estimation of the borders of the
activity instances.
In this thesis, we show that our new activity recognition strategy, built on invariant
sub-activities, is beneficial. We tested it on three human activity datasets with wearable
inertial measurement units (IMU). Compared to previous publications on the same
datasets we got improvement in the activity recognition in several classes, some with a
large margin. Our segmentation achieves a sensible method to separate the sensor data in
relation to the underlying activities. Relying on sub-activities makes us independent from
imprecise labels on the training data. After the identification of invariant sub-activities,
we calculate a value called cluster precision for each sensor signal and each class activity.
This tells us which classes can be easily classified and which sensor channels support
the classification best. Finally, in the training for each activity class, our algorithm selects
suitable signal channels with invariant sub-activities on different points in time and
with different length. This makes our strategy a multi-dimensional asynchronous motif
detection with variable motif length.

Graphs and flow networks are important mathematical concepts that enable the modeling and analysis of a large variety of real world problems in different domains such as engineering, medicine or computer science. The number, sizes and complexities of those problems permanently increased during the last decades. This led to an increased demand of techniques that help domain experts in understanding their data and its underlying structure to enable an efficient analysis and decision making process.
To tackle this challenge, this work presents several new techniques that utilize concepts of visual analysis to provide domain scientists with new visualization methodologies and tools. Therefore, this work provides novel concepts and approaches for diverse aspects of the visual analysis such as data transformation, visual mapping, parameter refinement and analysis, model building and visualization as well as user interaction.
The presented techniques form a framework that enriches domain scientists with new visual analysis tools and help them analyze their data and gain insight from the underlying structures. To show the applicability and effectiveness of the presented approaches, this work tackles different applications such as networking, product flow management and vascular systems, while preserving the generality to be applicable to further domains.

The simulation of physical phenomena involving the dynamic behavior of fluids and gases
has numerous applications in various fields of science and engineering. Of particular interest
is the material transport behavior, the tendency of a flow field to displace parts of the
medium. Therefore, many visualization techniques rely on particle trajectories.
Lagrangian Flow Field Representation. In typical Eulerian settings, trajectories are
computed from the simulation output using numerical integration schemes. Accuracy concerns
arise because, due to limitations of storage space and bandwidth, often only a fraction
of the computed simulation time steps are available. Prior work has shown empirically that
a Lagrangian, trajectory-based representation can improve accuracy [Agr+14]. Determining
the parameters of such a representation in advance is difficult; a relationship between the
temporal and spatial resolution and the accuracy of resulting trajectories needs to be established.
We provide an error measure for upper bounds of the error of individual trajectories.
We show how areas at risk for high errors can be identified, thereby making it possible to
prioritize areas in time and space to allocate scarce storage resources.
Comparative Visual Analysis of Flow Field Ensembles. Independent of the representation,
errors of the simulation itself are often caused by inaccurate initial conditions,
limitations of the chosen simulation model, and numerical errors. To gain a better understanding
of the possible outcomes, multiple simulation runs can be calculated, resulting in
sets of simulation output referred to as ensembles. Of particular interest when studying the
material transport behavior of ensembles is the identification of areas where the simulation
runs agree or disagree. We introduce and evaluate an interactive method that enables application
scientists to reliably identify and examine regions of agreement and disagreement,
while taking into account the local transport behavior within individual simulation runs.
Particle-Based Representation and Visualization of Uncertain Flow Data Sets. Unlike
simulation ensembles, where uncertainty of the solution appears in the form of different
simulation runs, moment-based Eulerian multi-phase fluid simulations are probabilistic in
nature. These simulations, used in process engineering to simulate the behavior of bubbles in
liquid media, are aimed toward reducing the need for real-world experiments. The locations
of individual bubbles are not modeled explicitly, but stochastically through the properties of
locally defined bubble populations. Comparisons between simulation results and physical
experiments are difficult. We describe and analyze an approach that generates representative
sets of bubbles for moment-based simulation data. Using our approach, application scientists
can directly, visually compare simulation results and physical experiments.

The focus of this work is to provide and evaluate a novel method for multifield topology-based analysis and visualization. Through this concept, called Pareto sets, one is capable to identify critical regions in a multifield with arbitrary many individual fields. It uses ideas found in graph optimization to find common behavior and areas of divergence between multiple optimization objectives. The connections between the latter areas can be reduced into a graph structure allowing for an abstract visualization of the multifield to support data exploration and understanding.
The research question that is answered in this dissertation is about the general capability and expandability of the Pareto set concept in context of visualization and application. Furthermore, the study of its relations, drawbacks and advantages towards other topological-based approaches. This questions is answered in several steps, including consideration and comparison with related work, a thorough introduction of the Pareto set itself as well as a framework for efficient implementation and an attached discussion regarding limitations of the concept and their implications for run time, suitable data, and possible improvements.
Furthermore, this work considers possible simplification approaches like integrated single-field simplification methods but also using common structures identified through the Pareto set concept to smooth all individual fields at once. These considerations are especially important for real-world scenarios to visualize highly complex data by removing small local structures without destroying information about larger, global trends.
To further emphasize possible improvements and expandability of the Pareto set concept, the thesis studies a variety of different real world applications. For each scenario, this work shows how the definition and visualization of the Pareto set is used and improved for data exploration and analysis based on the scenarios.
In summary, this dissertation provides a complete and sound summary of the Pareto set concept as ground work for future application of multifield data analysis. The possible scenarios include those presented in the application section, but are found in a wide range of research and industrial areas relying on uncertainty analysis, time-varying data, and ensembles of data sets in general.

Novel image processing techniques have been in development for decades, but most
of these techniques are barely used in real world applications. This results in a gap
between image processing research and real-world applications; this thesis aims to
close this gap. In an initial study, the quantification, propagation, and communication
of uncertainty were determined to be key features in gaining acceptance for
new image processing techniques in applications.
This thesis presents a holistic approach based on a novel image processing pipeline,
capable of quantifying, propagating, and communicating image uncertainty. This
work provides an improved image data transformation paradigm, extending image
data using a flexible, high-dimensional uncertainty model. Based on this, a completely
redesigned image processing pipeline is presented. In this pipeline, each
step respects and preserves the underlying image uncertainty, allowing image uncertainty
quantification, image pre-processing, image segmentation, and geometry
extraction. This is communicated by utilizing meaningful visualization methodologies
throughout each computational step.
The presented methods are examined qualitatively by comparing to the Stateof-
the-Art, in addition to user evaluation in different domains. To show the applicability
of the presented approach to real world scenarios, this thesis demonstrates
domain-specific problems and the successful implementation of the presented techniques
in these domains.

The Symbol Grounding Problem (SGP) is one of the first attempts to proposed a hypothesis about mapping abstract concepts and the real world. For example, the concept "ball" can be represented by an object with a round shape (visual modality) and phonemes /b/ /a/ /l/ (audio modality).
This thesis is inspired by the association learning presented in infant development.
Newborns can associate visual and audio modalities of the same concept that are presented at the same time for vocabulary acquisition task.
The goal of this thesis is to develop a novel framework that combines the constraints of the Symbol Grounding Problem and Neural Networks in a simplified scenario of association learning in infants. The first motivation is that the network output can be considered as numerical symbolic features because the attributes of input samples are already embedded. The second motivation is the association between two samples is predefined before training via the same vectorial representation. This thesis proposes to associate two samples and the vectorial representation during training. Two scenarios are considered: sample pair association and sequence pair association.
Three main contributions are presented in this work.
The first contribution is a novel Symbolic Association Model based on two parallel MLPs.
The association task is defined by learning that two instances that represent one concept.
Moreover, a novel training algorithm is defined by matching the output vectors of the MLPs with a statistical distribution for obtaining the relationship between concepts and vectorial representations.
The second contribution is a novel Symbolic Association Model based on two parallel LSTM networks that are trained on weakly labeled sequences.
The definition of association task is extended to learn that two sequences represent the same series of concepts.
This model uses a training algorithm that is similar to MLP-based approach.
The last contribution is a Classless Association.
The association task is defined by learning based on the relationship of two samples that represents the same unknown concept.
In summary, the contributions of this thesis are to extend Artificial Intelligence and Cognitive Computation research with a new constraint that is cognitive motivated. Moreover, two training algorithms with a new constraint are proposed for two cases: single and sequence associations. Besides, a new training rule with no-labels with promising results is proposed.

In recent years, enormous progress has been made in the field of Artificial Intelligence (AI). Especially the introduction of Deep Learning and end-to-end learning, the availability of large datasets and the necessary computational power in form of specialised hardware allowed researchers to build systems with previously unseen performance in areas such as computer vision, machine translation and machine gaming. In parallel, the Semantic Web and its Linked Data movement have published many interlinked RDF datasets, forming the world’s largest, decentralised and publicly available knowledge base.
Despite these scientific successes, all current systems are still narrow AI systems. Each of them is specialised to a specific task and cannot easily be adapted to all other human intelligence tasks, as would be necessary for Artificial General Intelligence (AGI). Furthermore, most of the currently developed systems are not able to learn by making use of freely available knowledge such as provided by the Semantic Web. Autonomous incorporation of new knowledge is however one of the pre-conditions for human-like problem solving.
This work provides a small step towards teaching machines such human-like reasoning on freely available knowledge from the Semantic Web. We investigate how human associations, one of the building blocks of our thinking, can be simulated with Linked Data. The two main results of these investigations are a ground truth dataset of semantic associations and a machine learning algorithm that is able to identify patterns for them in huge knowledge bases.
The ground truth dataset of semantic associations consists of DBpedia entities that are known to be strongly associated by humans. The dataset is published as RDF and can be used for future research.
The developed machine learning algorithm is an evolutionary algorithm that can learn SPARQL queries from a given SPARQL endpoint based on a given list of exemplary source-target entity pairs. The algorithm operates in an end-to-end learning fashion, extracting features in form of graph patterns without the need for human intervention. The learned patterns form a feature space adapted to the given list of examples and can be used to predict target candidates from the SPARQL endpoint for new source nodes. On our semantic association ground truth dataset, our evolutionary graph pattern learner reaches a Recall@10 of > 63 % and an MRR (& MAP) > 43 %, outperforming all baselines. With an achieved Recall@1 of > 34% it even reaches average human top response prediction performance. We also demonstrate how the graph pattern learner can be applied to other interesting areas without modification.

Tables or ranked lists summarize facts about a group of entities in a concise and structured fashion. They are found in all kind of domains and easily comprehensible by humans. Some globally prominent examples of such rankings are the tallest buildings in the World, the richest people in Germany, or most powerful cars. The availability of vast amounts of tables or rankings from open domain allows different ways to explore data. Computing similarity between ranked lists, in order to find those lists where entities are presented in a similar order, carries important analytical insights. This thesis presents a novel query-driven Locality Sensitive Hashing (LSH) method, in order to efficiently find similar top-k rankings for a given input ranking. Experiments show that the proposed method provides a far better performance than inverted-index--based approaches, in particular, it is able to outperform the popular prefix-filtering method. Additionally, an LSH-based probabilistic pruning approach is proposed that optimizes the space utilization of inverted indices, while still maintaining a user-provided recall requirement for the results of the similarity search. Further, this thesis addresses the problem of automatically identifying interesting categorical attributes, in order to explore the entity-centric data by organizing them into meaningful categories. Our approach proposes novel statistical measures, beyond known concepts, like information entropy, in order to capture the distribution of data to train a classifier that can predict which categorical attribute will be perceived suitable by humans for data categorization. We further discuss how the information of useful categories can be applied in PANTHEON and PALEO, two data exploration frameworks developed in our group.