### Refine

#### Year of publication

- 2018 (51) (remove)

#### Document Type

- Doctoral Thesis (51) (remove)

#### Language

- English (51) (remove)

#### Keywords

- Visualization (3)
- Evaluation (2)
- classification (2)
- machine learning (2)
- 1D-CFD (1)
- 2D-CFD (1)
- ADAS (1)
- Addukt (1)
- Algorithmic Differentiation (1)
- Amination (1)
- Artificial Intelligence (1)
- Association (1)
- Automatische Differentiation (1)
- Benzol (1)
- Biomarker (1)
- Bluetooth (1)
- Brandenburg-Lubuskie (1)
- Caching (1)
- Collaboration (1)
- Computational Fluid Dynamics (1)
- Computer Supported Cooperative Work (1)
- Corridors (1)
- Cross-border regions (1)
- Cross-border transport (1)
- DFT calculation (1)
- Decision Support Systems (1)
- Diskrete Fourier-Transformation (1)
- Elastizität (1)
- Environmental inequality (1)
- European Pollutant Release and Transfer Register (E-PRTR) (1)
- European Territorial Cooperation (1)
- European Union (1)
- European Union policy-making (1)
- European integration (1)
- Europeanisation (1)
- Europäische Territoriale Zusammenarbeit (1)
- Evolutionary Algorithm (1)
- FFT (1)
- Flüssig-Flüssig-Extraktion (1)
- Geographic Information System (GIS) (1)
- Geoinformationssystem (1)
- German census (1)
- Greater Region Saar-Lor-Lux+ (1)
- Großregion Saar-Lor-Lux+ (1)
- Harmonische Analyse (1)
- Homogenisierung <Mathematik> (1)
- Hydrodynamics (1)
- Hydrodynamik (1)
- IRMPD (1)
- Industrial air pollution (1)
- Interaction (1)
- Klassifikation (1)
- Leukämie (1)
- Lineare partielle Differentialgleichung (1)
- Linked Data (1)
- Lippmann-Schwinger equation (1)
- Liquid-Liquid Extraction (1)
- Liquid-liquid extraction (1)
- Machine Learning (1)
- Manufacturing (1)
- Mass transfer (1)
- Menschenmenge (1)
- Metabolismus (1)
- Metal-Free (1)
- Mikrostruktur (1)
- Multi-Variate Data (1)
- Netzwerk (1)
- Node-Link Diagram (1)
- Numerische Strömungssimulation (1)
- Oxidant Evolution (1)
- PSPICE (1)
- Performance (1)
- Planning Support Systems (1)
- Policy implementation (1)
- Population balances (1)
- Populationsbilanzen (1)
- Process Data (1)
- Proteine (1)
- Prozessvisualisierung (1)
- Reactive extraction (1)
- Reaktivextraktion (1)
- SM-SQMOM (1)
- SOEP (1)
- SPARQL (1)
- SPARQL query learning (1)
- SQMOM (1)
- Semantic Web (1)
- Serumalbumine (1)
- Simulation (1)
- Smartphone (1)
- Smartwatch (1)
- Soft Spaces (1)
- Spatial regression models (1)
- Steady state (1)
- Stoffaustausch (1)
- Systemdesign (1)
- Thermoset, nanocomposite (1)
- Toxizität (1)
- Trans-European Transport Networks (1)
- Transeuropäische Verkehrsnetze (1)
- Transient state (1)
- Umweltgerechtigkeit (1)
- Visualisierung (1)
- WiFi (1)
- ab initio (1)
- acetate (1)
- algebroid curve (1)
- alpha shape method (1)
- anharmonic CH modes (1)
- anharmonic vibrations (1)
- artificial intelligence (1)
- associations (1)
- assymmetric carboxylate stretch vibrations (1)
- basic carboxylates (1)
- benzene (1)
- biomarker (1)
- biosensors (1)
- canonical ideal (1)
- carboxylate bridge (1)
- carboxylates (1)
- characterization of Structures (1)
- chromium (1)
- collaborative mobile sensing (1)
- collision induced dissociation (1)
- combination band (1)
- composites (1)
- coordinative flexibility (1)
- crash application (1)
- crowd condition estimation (1)
- crowd density estimation (1)
- crowd scanning (1)
- crowd sensing (1)
- cumulative IRMPD (1)
- curve singularity (1)
- cutting simulation (1)
- data sets (1)
- dataset (1)
- duality (1)
- dynamic fracture mechanics (1)
- embedding (1)
- end-to-end learning (1)
- endomorphism ring (1)
- environment perception (1)
- evolutionary algorithm (1)
- fermi resonance (1)
- finite element method (1)
- formate (1)
- fragmentation channel (1)
- good semigroup (1)
- graph embedding (1)
- hexadiendiale (1)
- homogenization (1)
- hybrid structure (1)
- impedance spectroscopy (1)
- infrared spectroscopy (1)
- inverse coordination (1)
- ion-sensitive field-effect transistor (1)
- iron (1)
- leukemia (1)
- linked data (1)
- metabolism (1)
- metal organic frameworks (1)
- micromechanics (1)
- muconaldehyde (1)
- multi-object tracking (1)
- normalization (1)
- overtone (1)
- oxo centered transition metal complexes (1)
- partial hydrolysis (1)
- participatory sensing (1)
- particle finite element method (1)
- pattern (1)
- phase field model (1)
- protein adducts (1)
- protein analysis (1)
- protein conjugate (1)
- pulsed and stirred columns (1)
- pulsierte und gerührte Kolonen (1)
- quasihomogeneity (1)
- readout system (1)
- sampling (1)
- semantic web (1)
- semigroup of values (1)
- serum albumin (1)
- silicon nanowire (1)
- spatial planning (1)
- stationary sensing (1)
- stationär (1)
- symmetrc carboxylate stretch vibrations (1)
- toxicity (1)
- transient (1)
- transition metal complexes (1)
- translation invariant spaces (1)
- transport (1)
- wireless signal (1)

#### Faculty / Organisational entity

- Fachbereich Informatik (15)
- Fachbereich Mathematik (11)
- Fachbereich Maschinenbau und Verfahrenstechnik (9)
- Fachbereich Chemie (5)
- Fachbereich Biologie (3)
- Fachbereich Wirtschaftswissenschaften (3)
- Fachbereich Elektrotechnik und Informationstechnik (2)
- Fachbereich Raum- und Umweltplanung (2)
- Fachbereich Sozialwissenschaften (1)

In modern algebraic geometry solutions of polynomial equations are studied from a qualitative point of view using highly sophisticated tools such as cohomology, \(D\)-modules and Hodge structures. The latter have been unified in Saito’s far-reaching theory of mixed Hodge modules, that has shown striking applications including vanishing theorems for cohomology. A mixed Hodge module can be seen as a special type of filtered \(D\)-module, which is an algebraic counterpart of a system of linear differential equations. We present the first algorithmic approach to Saito’s theory. To this end, we develop a Gröbner basis theory for a new class of algebras generalizing PBW-algebras.
The category of mixed Hodge modules satisfies Grothendieck’s six-functor formalism. In part these functors rely on an additional natural filtration, the so-called \(V\)-filtration. A key result of this thesis is an algorithm to compute the \(V\)-filtration in the filtered setting. We derive from this algorithm methods for the computation of (extraordinary) direct image functors under open embeddings of complements of pure codimension one subvarieties. As side results we show how to compute vanishing and nearby cycle functors and a quasi-inverse of Kashiwara’s equivalence for mixed Hodge modules.
Describing these functors in terms of local coordinates and taking local sections, we reduce the corresponding computations to algorithms over certain bifiltered algebras. It leads us to introduce the class of so-called PBW-reduction-algebras, a generalization of the class of PBW-algebras. We establish a comprehensive Gröbner basis framework for this generalization representing the involved filtrations by weight vectors.

Der Fokus der vorliegenden Arbeit liegt auf endlosfaser- und langfaserverstärkten
thermoplastischen Materialien. Hierfür wurde das „multilayered hybrid
(MLH)“ Konzept entwickelt und auf zwei Halbzeuge, den MLH-Roving und die MLHMat
angewendet. Der MLH-Roving ist ein Roving (bestehend aus Endlosfasern), der
durch thermoplastische Folien in mehrere Schichten geteilt wird. Der MLH-Roving
wird durch eine neuartige Spreizmethode mit anschließender thermischen Fixierung
und abschließender mehrfacher Faltung hergestellt. Dadurch können verschiedene
Faser-Matrix-Konfigurationen realisiert werden. Die MLH-Mat ist ein
glasmattenverstärktes thermoplastisches Material, das für hohe Fasergehalte bis 45
vol. % und verschiedene Matrixpolymere, z.B. Polypropylen (PP) und Polyamide 6
(PA6) geeignet ist. Sie zeichnet sich durch eine hohe Homogenität in der
Flächendichte und in der Faserrichtung aus. Durch dynamische Crashversuche mit
auf MLH-Roving und MLH-Mat basierenden Probekörpern wurden das
Crashverhalten und die Performance untersucht. Die Ergebnisse der Crashkörper
basierend auf langfaserverstärktem Material (MLH-Mat) und endlosfaserverstärktem
Material (MLH-Roving) waren vergleichbar. Die PA6-Typen zeigten eine bessere
Crashperformance als PP-Typen.
The present work deals with continuous fiber- and long fiber reinforced thermoplastic
materials. The concept of multilayered hybrid (MLH) structure was developed and
applied to the so-called MLH-roving and MLH-mat. The MLH-roving is a continuous
fiber roving separated evenly into several sublayers by thermoplastic films, through
the sequential processes of spreading with a newly derived equation, thermal fixing,
and folding. It was aimed to satisfy the variety of material configuration as well as the
variety in intermediate product. The MLH-mat is a glass mat reinforced thermoplastic
(GMT)-like material that is suitable for high fiber contents up to 45 vol. % and various
matrix polymers, e.g. polypropylene (PP), polyamide 6 (PA6). It showed homogeneity
in areal density, random directional fiber distribution, and reheating stability required
for molding process. On the MLH-roving and MLH-mat materials, the crash behavior
and performance were investigated by dynamic crash test. Long fiber reinforced
materials (MLH-mat) were equivalent to continuous fiber reinforced materials (MLHroving),
and PA6 grades showed higher crash performance than PP grades.

The gas phase infrared and fragmentation spectra of a systematic group of trimetallic oxo-centered
transition metal complexes are shown and discussed, with formate and acetate bridging ligands and
pyridine and water as axial ligands.
The stability of the complexes, as predicted by appropriate ab initio simulations, is demonstrated to
agree with collision induced dissociation (CID) measurements.
A broad range of DFT calculations are shown. They are used to simulate the geometry, the bonding
situation, relative stability and flexibility of the discussed complexes, and to specify the observed
trends. These simulations correctly predict the trends in the band splitting of the symmetric and
asymmetric carboxylate stretch modes, but fail to account for anharmonic effects observed specifically
in the mid IR range.
The infrared spectra of the different ligands are introduced in a brief literature review. Their changes
in different environments or different bonding situations are discussed and visualized, especially the
interplay between fundamental-, overtone-, and combination bands, as well as Fermi resonances
between them.
A new variation on the infrared multi photon dissociation (IRMPD) spectroscopy method is proposed
and evaluated. In addition to the commonly considered total fragment yield, the cumulative fragment
yield can be used to plot the wavelength dependent relative abundance of different fragmentation
products. This is shown to include valuable additional information on the excited chromophors, and
their coupling to specific fragmentation channels.
High quality homo- and heterometallic IRMPD spectra of oxo centered carboxylate complexes of
chromium and iron show the impacts of the influencing factors: the metal centers, the bridging ligands,
their carboxylate stretch modes and CH bend modes, and the terminal ligands.
In all four formate spectra, anharmonic effects are necessary to explain the observed spectra:
combination bands of both carboxylate stretch modes and a Fermi resonance of the fundamental of
the CH stretch mode, and a combination band of the asymmetric carboxylate stretch mode with the
CH bend mode of the formate bridging ligand.
For the water adduct species, partial hydrolysis is proposed to account for the changes in the observed
carboxylic stretch modes.
Appropriate experiments are suggested to verify the mode assignments that are not directly explained
by the ab initio calculations, the available experimental results or other means like deuteration
experiments.

Numerical Godeaux surfaces are minimal surfaces of general type with the smallest possible numerical invariants. It is known that the torsion group of a numerical Godeaux surface is cyclic of order \(m\leq 5\). A full classification has been given for the cases \(m=3,4,5\) by the work of Reid and Miyaoka. In each case, the corresponding moduli space is 8-dimensional and irreducible.
There exist explicit examples of numerical Godeaux surfaces for the orders \(m=1,2\), but a complete classification for these surfaces is still missing.
In this thesis we present a construction method for numerical Godeaux surfaces which is based on homological algebra and computer algebra and which arises from an experimental approach by Schreyer. The main idea is to consider the canonical ring \(R(X)\) of a numerical Godeaux surface \(X\) as a module over some graded polynomial ring \(S\). The ring \(S\) is chosen so that \(R(X)\) is finitely generated as an \(S\)-module and a Gorenstein \(S\)-algebra of codimension 3. We prove that the canonical ring of any numerical Godeaux surface, considered as an \(S\)-module, admits a minimal free resolution whose middle map is alternating. Moreover, we show that a partial converse of this statement is true under some additional conditions.
Afterwards we use these results to construct (canonical rings of) numerical Godeaux surfaces. Hereby, we restrict our study to surfaces whose bicanonical system has no fixed component but 4 distinct base points, in the following referred to as marked numerical Godeaux surfaces.
The particular interest of this thesis lies on marked numerical Godeaux surfaces whose torsion group is trivial. For these surfaces we study the fibration of genus 4 over \(\mathbb{P}^1\) induced by the bicanonical system. Catanese and Pignatelli showed that the general fibre is non-hyperelliptic and that the number \(\tilde{h}\) of hyperelliptic fibres is bounded by 3. The two explicit constructions of numerical Godeaux surfaces with a trivial torsion group due to Barlow and Craighero-Gattazzo, respectively, satisfy \(\tilde{h} = 2\).
With the method from this thesis, we construct an 8-dimensional family of numerical Godeaux surfaces with a trivial torsion group and whose general element satisfy \(\tilde{h}=0\).
Furthermore, we establish a criterion for the existence of hyperelliptic fibres in terms of a minimal free resolution of \(R(X)\). Using this criterion, we verify experimentally the
existence of a numerical Godeaux surface with \(\tilde{h}=1\).

The growing computational power enables the establishment of the Population Balance Equation (PBE)
to model the steady state and dynamic behavior of multiphase flow unit operations. Accordingly, the twophase
flow
behavior inside liquid-liquid extraction equipment is characterized by different factors. These
factors include: interactions among droplets (breakage and coalescence), different time scales due to the
size distribution of the dispersed phase, and micro time scales of the interphase diffusional mass transfer
process. As a result of this, the general PBE has no well known analytical solution and therefore robust
numerical solution methods with low computational cost are highly admired.
In this work, the Sectional Quadrature Method of Moments (SQMOM) (Attarakih, M. M., Drumm, C.,
Bart, H.-J. (2009). Solution of the population balance equation using the Sectional Quadrature Method of
Moments (SQMOM). Chem. Eng. Sci. 64, 742-752) is extended to take into account the continuous flow
systems in spatial domain. In this regard, the SQMOM is extended to solve the spatially distributed
nonhomogeneous bivariate PBE to model the hydrodynamics and physical/reactive mass transfer
behavior of liquid-liquid extraction equipment. Based on the extended SQMOM, two different steady
state and dynamic simulation algorithms for hydrodynamics and mass transfer behavior of liquid-liquid
extraction equipment are developed and efficiently implemented. At the steady state modeling level, a
Spatially-Mixed SQMOM (SM-SQMOM) algorithm is developed and successfully implemented in a onedimensional
physical spatial domain. The integral spatial numerical flux is closed using the mean mass
droplet diameter based on the One Primary and One Secondary Particle Method (OPOSPM which is the
simplest case of the SQMOM). On the other hand the hydrodynamics integral source terms are closed
using the analytical Two-Equal Weight Quadrature (TEqWQ). To avoid the numerical solution of the
droplet rise velocity, an analytical solution based on the algebraic velocity model is derived for the
particular case of unit velocity exponent appearing in the droplet swarm model. In addition to this, the
source term due to mass transport is closed using OPOSPM. The resulting system of ordinary differential
equations with respect to space is solved using the MATLAB adaptive Runge–Kutta method (ODE45). At
the dynamic modeling level, the SQMOM is extended to a one-dimensional physical spatial domain and
resolved using the finite volume method. To close the mathematical model, the required quadrature nodes
and weights are calculated using the analytical solution based on the Two Unequal Weights Quadrature
(TUEWQ) formula. By applying the finite volume method to the spatial domain, a semi-discreet ordinary
differential equation system is obtained and solved. Both steady state and dynamic algorithms are
extensively validated at analytical, numerical, and experimental levels. At the numerical level, the
predictions of both algorithms are validated using the extended fixed pivot technique as implemented in
PPBLab software (Attarakih, M., Alzyod, S., Abu-Khader, M., Bart, H.-J. (2012). PPBLAB: A new
multivariate population balance environment for particulate system modeling and simulation. Procedia
Eng. 42, pp. 144-562). At the experimental validation level, the extended SQMOM is successfully used
to model the steady state hydrodynamics and physical and reactive mass transfer behavior of agitated
liquid-liquid extraction columns under different operating conditions. In this regard, both models are
found efficient and able to follow liquid extraction column behavior during column scale-up, where three
column diameters were investigated (DN32, DN80, and DN150). To shed more light on the local
interactions among the contacted phases, a reduced coupled PBE and CFD framework is used to model
the hydrodynamic behavior of pulsed sieve plate columns. In this regard, OPOSPM is utilized and
implemented in FLUENT 18.2 commercial software as a special case of the SQMOM. The dropletdroplet
interactions
(breakage
and
coalescence)
are
taken
into
account
using
OPOSPM,
while
the
required
information
about
the
velocity
field
and
energy
dissipation
is
calculated
by
the
CFD
model.
In
addition
to
this,
the proposed coupled OPOSPM-CFD framework is extended to include the mass transfer. The
proposed framework is numerically tested and the results are compared with the published experimental
data. The required breakage and coalescence parameters to perform the 2D-CFD simulation are estimated
using PPBLab software, where a 1D-CFD simulation using a multi-sectional gird is performed. A very
good agreement is obtained at the experimental and the numerical validation levels.

The Symbol Grounding Problem (SGP) is one of the first attempts to proposed a hypothesis about mapping abstract concepts and the real world. For example, the concept "ball" can be represented by an object with a round shape (visual modality) and phonemes /b/ /a/ /l/ (audio modality).
This thesis is inspired by the association learning presented in infant development.
Newborns can associate visual and audio modalities of the same concept that are presented at the same time for vocabulary acquisition task.
The goal of this thesis is to develop a novel framework that combines the constraints of the Symbol Grounding Problem and Neural Networks in a simplified scenario of association learning in infants. The first motivation is that the network output can be considered as numerical symbolic features because the attributes of input samples are already embedded. The second motivation is the association between two samples is predefined before training via the same vectorial representation. This thesis proposes to associate two samples and the vectorial representation during training. Two scenarios are considered: sample pair association and sequence pair association.
Three main contributions are presented in this work.
The first contribution is a novel Symbolic Association Model based on two parallel MLPs.
The association task is defined by learning that two instances that represent one concept.
Moreover, a novel training algorithm is defined by matching the output vectors of the MLPs with a statistical distribution for obtaining the relationship between concepts and vectorial representations.
The second contribution is a novel Symbolic Association Model based on two parallel LSTM networks that are trained on weakly labeled sequences.
The definition of association task is extended to learn that two sequences represent the same series of concepts.
This model uses a training algorithm that is similar to MLP-based approach.
The last contribution is a Classless Association.
The association task is defined by learning based on the relationship of two samples that represents the same unknown concept.
In summary, the contributions of this thesis are to extend Artificial Intelligence and Cognitive Computation research with a new constraint that is cognitive motivated. Moreover, two training algorithms with a new constraint are proposed for two cases: single and sequence associations. Besides, a new training rule with no-labels with promising results is proposed.

In recent years, enormous progress has been made in the field of Artificial Intelligence (AI). Especially the introduction of Deep Learning and end-to-end learning, the availability of large datasets and the necessary computational power in form of specialised hardware allowed researchers to build systems with previously unseen performance in areas such as computer vision, machine translation and machine gaming. In parallel, the Semantic Web and its Linked Data movement have published many interlinked RDF datasets, forming the world’s largest, decentralised and publicly available knowledge base.
Despite these scientific successes, all current systems are still narrow AI systems. Each of them is specialised to a specific task and cannot easily be adapted to all other human intelligence tasks, as would be necessary for Artificial General Intelligence (AGI). Furthermore, most of the currently developed systems are not able to learn by making use of freely available knowledge such as provided by the Semantic Web. Autonomous incorporation of new knowledge is however one of the pre-conditions for human-like problem solving.
This work provides a small step towards teaching machines such human-like reasoning on freely available knowledge from the Semantic Web. We investigate how human associations, one of the building blocks of our thinking, can be simulated with Linked Data. The two main results of these investigations are a ground truth dataset of semantic associations and a machine learning algorithm that is able to identify patterns for them in huge knowledge bases.
The ground truth dataset of semantic associations consists of DBpedia entities that are known to be strongly associated by humans. The dataset is published as RDF and can be used for future research.
The developed machine learning algorithm is an evolutionary algorithm that can learn SPARQL queries from a given SPARQL endpoint based on a given list of exemplary source-target entity pairs. The algorithm operates in an end-to-end learning fashion, extracting features in form of graph patterns without the need for human intervention. The learned patterns form a feature space adapted to the given list of examples and can be used to predict target candidates from the SPARQL endpoint for new source nodes. On our semantic association ground truth dataset, our evolutionary graph pattern learner reaches a Recall@10 of > 63 % and an MRR (& MAP) > 43 %, outperforming all baselines. With an achieved Recall@1 of > 34% it even reaches average human top response prediction performance. We also demonstrate how the graph pattern learner can be applied to other interesting areas without modification.

Though environmental inequality research has gained extensive interest in the United States, it has received far less attention in Europe and Germany. The main objective of this book is to extend the research on environmental inequality in Germany. This book aims to shed more light on the question of whether minorities in Germany are affected by a disproportionately high burden of environmental pollution, and to increase the general knowledge about the causal mechanisms, which contribute to the unequal distribution of environmental hazards across the population.
To improve our knowledge about environmental inequality in Germany, this book extends previous research in several ways. First, to evaluate the extent of environmental inequality, this book relies on two different data sources. On the on hand, it uses household-level survey data and self-reports about the impairment through air pollution. On the other hand, it combines aggregated census data and objective register-based measures of industrial air pollution by using geographic information systems (GIS). Consequently, this book offers the first analysis of environmental inequality on the national level that uses objective measures of air pollution in Germany. Second, to evaluate the causes of environmental inequality, this book applies a panel data analysis on the household level, thereby offering the first longitudinal analysis of selective migration processes outside the United States. Third, it compares the level of environmental inequality between German metropolitan areas and evaluates to which extent the theoretical arguments of environmental inequality can explain differing levels of environmental inequality across the country. By doing so, this book not only investigates the impact of indicators derived by the standard strand of theoretical reasoning but also includes structural characteristics of the urban space.
All studies presented in this book confirm the disproportionate exposure of minorities to environmental pollution. Minorities live in more polluted areas in Germany but also in more polluted parts of the communities, and this disadvantage is most severe in metropolitan regions. Though this book finds evidence for selective migration processes contributing to the disproportionate exposure of minorities to environmental pollution, it also stresses the importance of urban conditions. Especially cities with centrally located industrial facilities yield a high level of environmental inequality. This poses the question of whether environmental inequality might be the result of two independent processes: 1) urban infrastructure confines residential choices of minorities to the urban core, and 2) urban infrastructure facilitates centrally located industries. In combination, both processes lead to a disproportionate burden of minority households.

Tables or ranked lists summarize facts about a group of entities in a concise and structured fashion. They are found in all kind of domains and easily comprehensible by humans. Some globally prominent examples of such rankings are the tallest buildings in the World, the richest people in Germany, or most powerful cars. The availability of vast amounts of tables or rankings from open domain allows different ways to explore data. Computing similarity between ranked lists, in order to find those lists where entities are presented in a similar order, carries important analytical insights. This thesis presents a novel query-driven Locality Sensitive Hashing (LSH) method, in order to efficiently find similar top-k rankings for a given input ranking. Experiments show that the proposed method provides a far better performance than inverted-index--based approaches, in particular, it is able to outperform the popular prefix-filtering method. Additionally, an LSH-based probabilistic pruning approach is proposed that optimizes the space utilization of inverted indices, while still maintaining a user-provided recall requirement for the results of the similarity search. Further, this thesis addresses the problem of automatically identifying interesting categorical attributes, in order to explore the entity-centric data by organizing them into meaningful categories. Our approach proposes novel statistical measures, beyond known concepts, like information entropy, in order to capture the distribution of data to train a classifier that can predict which categorical attribute will be perceived suitable by humans for data categorization. We further discuss how the information of useful categories can be applied in PANTHEON and PALEO, two data exploration frameworks developed in our group.

Computational problems that involve dynamic data, such as physics simulations and program development environments, have been an important
subject of study in programming languages. Recent advances in self-adjusting
computation made progress towards achieving efficient incremental computation by providing algorithmic language abstractions to express computations that respond automatically to dynamic changes in their inputs. Selfadjusting programs have been shown to be efficient for a broad range of problems via an explicit programming style, where the programmer uses specific
primitives to identify, create and operate on data that can change over time.
This dissertation presents implicit self-adjusting computation, a type directed technique for translating purely functional programs into self-adjusting
programs. In this implicit approach, the programmer annotates the (toplevel) input types of the programs to be translated. Type inference finds
all other types, and a type-directed translation rewrites the source program
into an explicitly self-adjusting target program. The type system is related to
information-flow type systems and enjoys decidable type inference via constraint solving. We prove that the translation outputs well-typed self-adjusting
programs and preserves the source program’s input-output behavior, guaranteeing that translated programs respond correctly to all changes to their
data. Using a cost semantics, we also prove that the translation preserves the
asymptotic complexity of the source program.
As a second contribution, we present two techniques to facilitate the processing of large and dynamic data in self-adjusting computation. First, we
present a type system for precise dependency tracking that minimizes the
time and space for storing dependency metadata. The type system improves
the scalability of self-adjusting computation by eliminating an important assumption of prior work that can lead to recording spurious dependencies.
We present a type-directed translation algorithm that generates correct selfadjusting programs without relying on this assumption. Second, we show a
probabilistic-chunking technique to further decrease space usage by controlling the fundamental space-time tradeoff in self-adjusting computation.
We implement implicit self-adjusting computation as an extension to Standard ML with compiler and runtime support. Using the compiler, we are able
to incrementalize an interesting set of applications, including standard list
and matrix benchmarks, ray tracer, PageRank, sparse graph connectivity, and
social circle counts. Our experiments show that our compiler incrementalizes existing code with only trivial amounts of annotation, and the resulting
programs bring asymptotic improvements to large datasets from real-world
applications, leading to orders of magnitude speedups in practice.