## Doctoral Thesis

### Refine

#### Year of publication

#### Document Type

- Doctoral Thesis (631) (remove)

#### Language

- English (631) (remove)

#### Keywords

- Visualisierung (13)
- finite element method (8)
- Finite-Elemente-Methode (7)
- Algebraische Geometrie (6)
- Numerische Strömungssimulation (6)
- Visualization (6)
- Computergraphik (5)
- Finanzmathematik (5)
- Mobilfunk (5)
- Optimization (5)

#### Faculty / Organisational entity

- Fachbereich Mathematik (221)
- Fachbereich Informatik (141)
- Fachbereich Maschinenbau und Verfahrenstechnik (95)
- Fachbereich Chemie (59)
- Fachbereich Elektrotechnik und Informationstechnik (45)
- Fachbereich Biologie (29)
- Fachbereich Sozialwissenschaften (16)
- Fachbereich Wirtschaftswissenschaften (8)
- Fachbereich Physik (6)
- Fachbereich ARUBI (5)

Nowadays a large part of communication is taking place on social media platforms such as Twitter, Facebook, Instagram, or YouTube, where messages often include multimedia contents (e.g., images, GIFs or videos). Since such messages are in digital form, computers can in principle process them in order to make our lives more convenient and help us overcome arising issues. However, these goals require the ability to capture what these messages mean to us, that is, how we interpret them from our own subjective points of view. Thus, the main goal of this dissertation is to advance a machine's ability to interpret social media contents in a more natural, subjective way.
To this end, three research questions are addressed. The first question aims at answering "How to model human interpretation for machine learning?" We describe a way of modeling interpretation which allows for analyzing single or multiple ways of interpretation of both humans and computer models within the same theoretic framework. In a comprehensive survey we collect various possibilities for such a computational analysis. Particularly interesting are machine learning approaches where a single neural network learns multiple ways of interpretation. For example, a neural network can be trained to predict user-specific movie ratings from movie features and user ID, and can then be analyzed to understand how users rate movies. This is a promising direction, as neural networks are capable of learning complex patterns. However, how analysis results depend on network architecture is a largely unexplored topic. For the example of movie ratings, we show that the way of combining information for prediction can affect both prediction performance and what the network learns about the various ways of interpretation (corresponding to users).
Since some application-specific details for dealing with human interpretation only become visible when going deeper into particular use-cases, the other two research questions of this dissertation are concerned with two selected application domains: Subjective visual interpretation and gang violence prevention. The first application study deals with subjectivity that comes from personal attitudes and aims at answering "How can we predict subjective image interpretation one would expect from the general public on photo-sharing platforms such as Flickr?" The predictions in this case take the form of subjective concepts or phrases. Our study on gang violence prevention is more community-centered and considers the question "How can we automatically detect tweets of gang members which could potentially lead to violence?" There, the psychosocial codes aggression, loss and substance use serve as proxy to estimate the subjective implications of online messages.
In these two distinct application domains, we develop novel machine learning models for predicting subjective interpretations of images or tweets with images, respectively. In the process of building these detection tools, we also create three different datasets which we share with the research community. Furthermore, we see that some domains such as Chicago gangs require special care due to high vulnerability of involved users. This motivated us to establish and describe an in-depth collaboration between social work researchers and computer scientists. As machine learning is incorporating more and more subjective components and gaining societal impact, we have good reason to believe that similar collaborations between the humanities and computer science will become increasingly necessary to advance the field in an ethical way.

Biological clocks exist across all life forms and serve to coordinate organismal physiology with periodic environmental changes. The underlying mechanism of these clocks is predominantly based on cellular transcription-translation feedback loops in which clock proteins mediate the periodic expression of numerous genes. However, recent studies point to the existence of a conserved timekeeping mechanism independent of cellular transcription and translation, but based on cellular metabolism. These metabolic clocks were concluded based upon the observation of circadian and ultradian oscillations in the level of hyperoxidized peroxiredoxin proteins. Peroxiredoxins are enzymes found almost ubiquitously throughout life. Originally identified as H2O2 scavengers, recent studies show that peroxiredoxins can transfer oxidation to, and thereby regulate, a wide range of cellular proteins. Thus, it is conceivable that peroxiredoxins, using H2O2 as the primary signaling molecule, have the potential to integrate and coordinate much of cellular physiology and behavior with metabolic changes. Nonetheless, it remained unclear if peroxiredoxins are passive reporters of metabolic clock activity or active determinants of cellular timekeeping. Budding yeast possess an ultradian metabolic clock termed the Yeast Metabolic Cycle (YMC). The most obvious feature of the YMC is a high amplitude oscillation in oxygen consumption. Like circadian clocks, the YMC temporally compartmentalizes cellular processes (e.g. metabolism) and coordinates cellular programs such as gene expression and cell division. The YMC also exhibits oscillations in the level of hyperoxidized peroxiredoxin proteins.
In this study, I used the YMC clock model to investigate the role of peroxiredoxins in cellular timekeeping, as well as the coordination of cell division with the metabolic clock. I observed that cytosolic 2-Cys peroxiredoxins are essential for robust metabolic clock function. I provide direct evidence for oscillations in cytosolic H2O2 levels, as well as cyclical changes in oxidation state of a peroxiredoxin and a model peroxiredoxin target protein during the YMC. I noted two distinct metabolic states during the YMC: low oxygen consumption (LOC) and high oxygen consumption (HOC). I demonstrate that thiol-disulfide oxidation and reduction are necessary for switching between LOC and HOC. Specifically, a thiol reductant promotes switching to HOC, whilst a thiol oxidant prevents switching to HOC, forcing cells to remain in LOC. Transient peroxiredoxin inactivation triggered rapid and premature switching from LOC to HOC. Furthermore, I show that cell division is normally synchronized with the YMC and that deletion of typical 2-Cys peroxiredoxins leads to complete uncoupling of cell division from metabolic cycling. Moreover, metabolic oscillations are crucial for regulating cell cycle entry and exit. Intriguingly, switching to HOC is crucial for initiating cell cycle entry whilst switching to LOC is crucial for cell cycle completion and exit. Consequently, forcing cells to remain in HOC by application of a thiol reductant leads to multiple rounds of cell cycle entry despite failure to complete the preceding cell cycle. On the other hand, forcing cells to remain in LOC by treating with a thiol oxidant prevents initiation of cell cycle entry.
In conclusion, I propose that peroxiredoxins – by controlling metabolic cycles, which are in turn crucial for regulating the progression through cell cycle – play a central role in the coordination of cellular metabolism with cell division. This proposition, thus, positions peroxiredoxins as active players in the cellular timekeeping mechanism.

In this thesis we study a variant of the quadrature problem for stochastic differential equations (SDEs), namely the approximation of expectations \(\mathrm{E}(f(X))\), where \(X = (X(t))_{t \in [0,1]}\) is the solution of an SDE and \(f \colon C([0,1],\mathbb{R}^r) \to \mathbb{R}\) is a functional, mapping each realization of \(X\) into the real numbers. The distinctive feature in this work is that we consider randomized (Monte Carlo) algorithms with random bits as their only source of randomness, whereas the algorithms commonly studied in the literature are allowed to sample from the uniform distribution on the unit interval, i.e., they do have access to random numbers from \([0,1]\).
By assumption, all further operations like, e.g., arithmetic operations, evaluations of elementary functions, and oracle calls to evaluate \(f\) are considered within the real number model of computation, i.e., they are carried out exactly.
In the following, we provide a detailed description of the quadrature problem, namely we are interested in the approximation of
\begin{align*}
S(f) = \mathrm{E}(f(X))
\end{align*}
for \(X\) being the \(r\)-dimensional solution of an autonomous SDE of the form
\begin{align*}
\mathrm{d}X(t) = a(X(t)) \, \mathrm{d}t + b(X(t)) \, \mathrm{d}W(t), \quad t \in [0,1],
\end{align*}
with deterministic initial value
\begin{align*}
X(0) = x_0 \in \mathbb{R}^r,
\end{align*}
and driven by a \(d\)-dimensional standard Brownian motion \(W\). Furthermore, the drift coefficient \(a \colon \mathbb{R}^r \to \mathbb{R}^r\) and the diffusion coefficient \(b \colon \mathbb{R}^r \to \mathbb{R}^{r \times d}\) are assumed to be globally Lipschitz continuous.
For the function classes
\begin{align*}
F_{\infty} = \bigl\{f \colon C([0,1],\mathbb{R}^r) \to \mathbb{R} \colon |f(x) - f(y)| \leq \|x-y\|_{\sup}\bigr\}
\end{align*}
and
\begin{align*}
F_p = \bigl\{f \colon C([0,1],\mathbb{R}^r) \to \mathbb{R} \colon |f(x) - f(y)| \leq \|x-y\|_{L_p}\bigr\}, \quad 1 \leq p < \infty.
\end{align*}
we have established the following.
\[\]
\(\textit{Theorem 1.}\)
There exists a random bit multilevel Monte Carlo (MLMC) algorithm \(M\) using
\[
L = L(\varepsilon,F) = \begin{cases}\lceil{\log_2(\varepsilon^{-2}}\rceil, &\text{if} \ F = F_p,\\
\lceil{\log_2(\varepsilon^{-2} + \log_2(\log_2(\varepsilon^{-1}))}\rceil, &\text{if} \ F = F_\infty
\end{cases}
\]
and replication numbers
\[
N_\ell = N_\ell(\varepsilon,F) = \begin{cases}
\lceil{(L+1) \cdot 2^{-\ell} \cdot \varepsilon^{-2}}\rceil, & \text{if} \ F = F_p,\\
\lceil{(L+1) \cdot 2^{-\ell} \cdot \max(\ell,1) \cdot \varepsilon^{-2}}\rceil, & \text{if} \ F=f_\infty
\end{cases}
\]
for \(\ell = 0,\ldots,L\), for which exists a positive constant \(c\) such that
\begin{align*}
\mathrm{error}(M,F) = \sup_{f \in F} \bigl(\mathrm{E}(S(f) - M(f))^2\bigr)^{1/2} \leq c \cdot \varepsilon
\end{align*}
and
\begin{align*}
\mathrm{cost}(M,F) = \sup_{f \in F} \mathrm{E}(\mathrm{cost}(M,f)) \leq c \cdot \varepsilon^{-2} \cdot \begin{cases}
(\ln(\varepsilon^{-1}))^2, &\text{if} \ F=F_p,\\
(\ln(\varepsilon^{-1}))^3, &\text{if} \ F=F_\infty
\end{cases}
\end{align*}
for every \(\varepsilon \in {]0,1/2[}\).
\[\]
Hence, in terms of the \(\varepsilon\)-complexity
\begin{align*}
\mathrm{comp}(\varepsilon,F) = \inf\bigl\{\mathrm{cost}(M,F) \colon M \ \text{is a random bit MC algorithm}, \mathrm{error}(M,F) \leq \varepsilon\bigr\}
\end{align*}
we have established the upper bound
\begin{align*}
\mathrm{comp}(\varepsilon,F) \leq c \cdot \varepsilon^{-2} \cdot \begin{cases}
(\ln(\varepsilon^{-1}))^2, &\text{if} \ F=F_p,\\
(\ln(\varepsilon^{-1}))^3, &\text{if} \ F=F_\infty
\end{cases}
\end{align*}
for some positive constant \(c\). That is, we have shown the same weak asymptotic upper bound as in the case of random numbers from \([0,1]\). Hence, in this sense, random bits are almost as powerful as random numbers for our computational problem.
Moreover, we present numerical results for a non-analyzed adaptive random bit MLMC Euler algorithm, in the particular cases of the Brownian motion, the geometric Brownian motion, the Ornstein-Uhlenbeck SDE and the Cox-Ingersoll-Ross SDE. We also provide a numerical comparison to the corresponding adaptive random number MLMC Euler method.
A key challenge in the analysis of the algorithm in Theorem 1 is the approximation of probability distributions by means of random bits. A problem very closely related to the quantization problem, i.e., the optimal approximation of a given probability measure (on a separable Hilbert space) by means of a probability measure with finite support size.
Though we have shown that the random bit approximation of the standard normal distribution is 'harder' than the corresponding quantization problem (lower weak rate of convergence), we have been able to establish the same weak rate of convergence as for the corresponding quantization problem in the case of the distribution of a Brownian bridge on \(L_2([0,1])\), the distribution of the solution of a scalar SDE on \(L_2([0,1])\), and the distribution of a centered Gaussian random element in a separable Hilbert space.

Activity recognition has continued to be a large field in computer science over the last two decades. Research questions from 15 years ago have led to solutions that today support our daily lives. Specifically, the success of smartphones or more recent developments of other smart devices (e.g., smart-watches) is rooted in applications that leverage on activity analysis and location tracking (fitness applications and maps). Today we can track our physical health and fitness and support our physical needs by merely owning (and using) a smart-phone. Still, the quality of our lives does not solely rely on fitness and physical health but also more increasingly on our mental well-being. Since we have learned how practical and easy it is to have a lot of functions, including health support on just one device, it would be specifically helpful if we could also use the smart-phone to support our mental and cognitive health if need be.
The ultimate goal of this work is to use sensor-assisted location and motion analysis to support various aspects of medically valid cognitive assessments.
In this regard, this thesis builds on Hypothesis 3: Sensors in our ubiquitous environment can collect information about our cognitive state, and it is possible to extract that information. In addition, these data can be used to derive complex cognitive states and to predict possible pathological changes in humans. After all, not only is it possible to determine the cognitive state through sensors but also to assist people in difficult situations through these sensors.
Thus, in the first part, this thesis focuses on the detection of mental state and state changes.
The primary purpose is to evaluate possible starting points for sensor systems in order to enable a clinically accurate assessment of mental states. These assessments must work on the condition that a developed system must be able to function within the given limits of a real clinical environment.
Despite the limitations and challenges of real-life deployments, it was possible to develop methods for determining the cognitive state and well-being of the residents. The analysis of the location data provides a correct classification of cognitive state with an average accuracy of 70% to 90%.
Methods to determine the state of bipolar patients provide an accuracy of 70-80\% for the detection of different cognitive states (total seven classes) using single sensors and 76% for merging data from different sensors. Methods for detecting the occurrence of state changes, a highlight of this work, even achieved a precision and recall of 95%.
The comparison of these results with currently used standard methods in psychiatric care even shows a clear advantage of the sensor-based method. The accuracy of the sensor-based analysis is 60% higher than the accuracy of the currently used methods.
The second part of this thesis introduces methods to support people’s actions in stressful situations on the one hand and analyzes the interaction between people during high-pressure activities on the other.
A simple, acceleration based, smartwatch instant feedback application was used to help laypeople to learn to perform CPR (cardiopulmonary resuscitation) in an emergency on the fly.
The evaluation of this application in a study with 43 laypersons showed an instant improvement in the CPR performance of 50%. An investigation of whether training with such an instant feedback device can support improved learning and lead to more permanent effects for gaining skills was able to confirm this theory.
Last but not least, with the main interest shifting from the individual to a group of people at the end of this work, the question: how can we determine the interaction between individuals within a group of people? was answered by developing a methodology to detect un-voiced collaboration in random ad-hoc groups. An evaluation with data retrieved from video footage provides an accuracy of up to more than 95%, and even with artificially introduced errors rates of 20%, still an accuracy of 70% precision, and 90% recall can be achieved.
All scenarios in this thesis address different practical issues of today’s health care. The methods developed are based on real-life datasets and real-world studies.

Spin-crossover and valence tautomeric complexes are of tremendous interest in the field of molecular electronics, electronic storage devices and information processing. Herein, synthesis and characterization of the spin-crossover and valence tautomeric cobalt dioxolene complexes are reported. All the synthesized complexes contain N,N'-di-tert-butyl-2,11-diaza[3.3](2,6)pyridinophane (L-N4tBu2) as ancillary ligands. Only various types of co-ligands which are different dioxolene ligands, have been used. The mononuclear cobalt dioxolene complexes have been synthesized by using dideprotonated form of the dioxolene ligand 4,5-dichlorocatechol (H2DCCat) as co-ligands, and the cobalt bis(dioxolene) complexes have been synthesized by using dideprotonated form of the 3,3'-dihydroxy-diphenoquinone-(4,4') (H2(SQ-SQ)) as co-ligands.
Analytically pure samples of the complexes [Co(L-N4tBu2)(DCCat)] (1), [Co(L-N4tBu2)(DCCat)](BPh4) (2b), [Co2(L-N4tBu2)2(SQ-SQ)](BPh4)2.4 DMF (3b), [Co2(L-N4tBu2)2(Cat-SQ)](BF4)2.Et2O (3d), have been synthesized and characterized by X-ray crystallography, magnetic and electrochemical measurements. The complexes have been investigated by UV/Vis/NIR-, IR-, and NMR spectroscopic measurements.
The complex [Co(L-N4tBu2)(DCCat)] (1) shows temperature invariant high-spin cobalt(II) catecholate state. One-electron oxidation of 1 has yielded the complex [Co(L-N4tBu2)(DCCat)](BPh4) (2b). The solid state properties of 2b are best described by the low-spin cobalt(III) catecholate state, but the solution state properties of the complex 2b are best described by the valence tautomeric transition from the low-spin cobalt(III) catecholate to the low-spin cobalt(II) semiquinonate state.
For the cobalt bis(dioxolene) complexes, it is found that spin-crossover for the two cobalt(II) centers is accompanied by the electronic state changes of the coordinated bis(dioxolene) unit from singlet open-shell biradicaloid to singlet closed-shell quinonoid form in complex 3b. Approaching similar synthetic method to 3b, but performing the metathesis reaction with sodium tetrafluoroborate rather than sodium tetraphenylborate has resulted in the formation of the complex [Co2(L-N4tBu2)2(Cat-SQ)](BF4)2.Et2O (3d). The solid state properties of the complex are best described by the temperature induced valence tautomeric transition for the low-spin cobalt(III) center which is accompanied by the spin-crossover process for the cobalt(II) center. Thus, the electronic state of the complex 3d changes from LS-CoIII-Cat-SQ-CoII-LS to HS-CoII-(SQ-SQ)CS-CoII-HS state upon change in temperature.
Temperature-induced electronic configuration changes of the (SQ-SQ)CS2- ligands from open-shell biradicaloid to closed-shell quinonoid configurations are not observed for the nickel-, copper- and zinc bis(dioxolene) complexes 4a, 5a and 6b, respectively. For these complexes, the metal ions are bridged by (SQ-SQ)CS2- ligand and the paramagnetic metal ions are very weakly antiferromagnetically coupled.

More than ten years ago, ER-ANT1 was shown to act as an ATP/ADP antiporter and to exist in the endoplasmic reticulum (ER) of higher plants. Because structurally different transporters generally mediate energy provision to the ER, the physiological function of ER-ANT1 was not directly evident.
Interestingly, mutant plants lacking ER-ANT1 exhibit a photorespiratory phenotype. Although many research efforts were undertaken, the possible connection between the transporter and photorespiration also remained elusive. Here, a forward genetic approach was used to decipher the role of ER-ANT1 in the plant context and its association to photorespiration.
This strategy identified that additional absence of a putative HAD-type phosphatase partially restored the photorespiratory phenotype. Localisation studies revealed that the corresponding protein is targeted to the chloroplast. Moreover, biochemical analyses demonstrate that the HAD-type phosphatase is specific for pyridoxal phosphate. These observations, together with transcriptional and metabolic data of corresponding single (ER-ANT1) and double (ER-ANT1, phosphatase) loss-of-function mutant plants revealed an unexpected connection of ER-ANT1 to vitamin B6 metabolism.
Finally, a scenario is proposed, which explains how ER-ANT1 may influence B6 vitamer phosphorylation, by this affects photorespiration and causes several other physiological alterations observed in the corresponding loss-of-function mutant plants.

Destructive diseases of the lung like lung cancer or fibrosis are still often lethal. Also in case of fibrosis in the liver, the only possible cure is transplantation.
In this thesis, we investigate 3D micro computed synchrotron radiation (SR\( \mu \)CT) images of capillary blood vessels in mouse lungs and livers. The specimen show so-called compensatory lung growth as well as different states of pulmonary and hepatic fibrosis.
During compensatory lung growth, after resecting part of the lung, the remaining part compensates for this loss by extending into the empty space. This process is accompanied by an active vessel growing.
In general, the human lung can not compensate for such a loss. Thus, understanding this process in mice is important to improve treatment options in case of diseases like lung cancer.
In case of fibrosis, the formation of scars within the organ's tissue forces the capillary vessels to grow to ensure blood supply.
Thus, the process of fibrosis as well as compensatory lung growth can be accessed by considering the capillary architecture.
As preparation of 2D microscopic images is faster, easier, and cheaper compared to SR\( \mu \)CT images, they currently form the basis of medical investigation. Yet, characteristics like direction and shape of objects can only properly be analyzed using 3D imaging techniques. Hence, analyzing SR\( \mu \)CT data provides valuable additional information.
For the fibrotic specimen, we apply image analysis methods well-known from material science. We measure the vessel diameter using the granulometry distribution function and describe the inter-vessel distance by the spherical contact distribution. Moreover, we estimate the directional distribution of the capillary structure. All features turn out to be useful to characterize fibrosis based on the deformation of capillary vessels.
It is already known that the most efficient mechanism of vessel growing forms small torus-shaped holes within the capillary structure, so-called intussusceptive pillars. Analyzing their location and number strongly contributes to the characterization of vessel growing. Hence, for all three applications, this is of great interest. This thesis provides the first algorithm to detect intussusceptive pillars in SR\( \mu \)CT images. After segmentation of raw image data, our algorithm works automatically and allows for a quantitative evaluation of a large amount of data.
The analysis of SR\( \mu \)CT data using our pillar algorithm as well as the granulometry, spherical contact distribution, and directional analysis extends the current state-of-the-art in medical studies. Although it is not possible to replace certain 3D features by 2D features without losing information, our results could be used to examine 2D features approximating the 3D findings reasonably well.

Function of two redox sensing kinases from the methanogenic archaeon Methanosarcina acetivorans
(2019)

MsmS is a heme-based redox sensor kinase in Methanosarcina acetivorans consisting of alternating PAS and GAF domains connected to a C-terminal kinase domain. In addition to MsmS, M. acetivorans possesses a second kinase, MA0863 with high sequence similarity. Interestingly, MA0863 possesses an amber codon in its second GAF domain, encoding for the amino acid pyrrolysine. Thus far, no function of this residue has been resolved. In order to examine the heme iron coordination in both proteins, an improved method for the production of heme proteins was established using the Escherichia coli strain Nissle 1917. This method enables the complete reconstitution of a recombinant hemoprotein during protein production, thereby resulting in a native heme coordination. Analysis of the full-length MsmS and MA0863 confirmed a covalently bound heme cofactor, which is connected to one conserved cysteine residue in each protein. In order to identify the coordinating amino acid residues of the heme iron, UV/vis spectra of different variants were measured. These studies revealed His702 in MsmS and the corresponding His666 in MA0863 as the proximal heme ligands. MsmS has previously been described as a heme-based redox sensor. In order to examine whether the same is true for MA0863, redox dependent kinase assays were performed. MA0863 indeed displays redox dependent autophosphorylation activity, which is independent of heme ligands and only observed under oxidizing conditions. Interestingly, autophosphorylation was shown to be independent of the heme cofactor but rather relies on thiol oxidation. Therefore, MA0863 was renamed in RdmS (redox dependent methyltransferase-associated sensor). In order to identify the phosphorylation site of RdmS, thin layer chromatography was performed identifying a tyrosine as the putative phosphorylation site. This observation is in agreement with the lack of a so-called H-box in typical histidine kinases. Due to their genomic localization, MsmS and RdmS were postulated to form two-component systems (TCS) with vicinal encoded regulator proteins MsrG and MsrF. Therefore, protein-protein interaction studies using the bacterial adenylate two hybrid system were performed suggesting an interaction of RdmS and MsmS with the three regulators MsrG/F/C. Due to these multiple interactions these signal transduction pathways should rather be considered multicomponent system instead of two component systems.

Ranking lists are an essential methodology to succinctly summarize outstanding items, computed over database tables or crowdsourced in dedicated websites. In this thesis, we propose the usage of automatically generated, entity-centric rankings to discover insights in data. We present PALEO, a framework for data exploration through reverse engineering top-k database queries, that is, given a database and a sample top-k input list, our approach, aims at determining an SQL query that returns results similar to the provided input when executed over the database. The core problem consist of finding selection predicates that return the given items, determining the correct ranking criteria, and evaluating the most promising candidate queries first. PALEO operates on subset of the base data, uses data samples, histograms, descriptive statistics, and further proposes models that assess the suitability of candidate queries which facilitate limitation of false positives. Furthermore, this thesis presents COMPETE, a novel approach that models and computes dominance over user-provided input entities, given a database of top-k rankings. The resulting entities are found superior or inferior with tunable degree of dominance over the input set---a very intuitive, yet insightful way to explore pros and cons of entities of interest. Several notions of dominance are defined which differ in computational complexity and strictness of the dominance concept---yet, interdependent through containment relations. COMPETE is able to pick the most promising approach to satisfy a user request at minimal runtime latency, using a probabilistic model that is estimating the result sizes. The individual flavors of dominance are cast into a stack of algorithms over inverted indices and auxiliary structures, enabling pruning techniques to avoid significant data access over large datasets of rankings.

Wine and alcoholic fermentations are complex and fascinating ecosystems. Wine aroma is shaped by the wine’s chemical compositions, in which both microbes and grape constituents play crucial roles. Activities of the microbial community impact the sensory properties of the final product, therefore, the characterisation of microbial diversity is essential in understanding and predicting sensory properties of wine. Characterisation has been challenging with traditional approaches, where microbes are isolated and therefore analyzed outside from their natural environment. This causes a bias in the observed microbial composition structure. In addition, true community interactions cannot be studied using isolates. Furthermore, the multiplex ties between wine chemical and sensory compositions remain evasive due to their multivariate and nonlinear nature. Therefore, the sensorial outcome arising from different microbial communities has remained inconclusive.
In this thesis, microbial diversity during Riesling wine fermentations is investigated with the aim to understand the roles of microbial communities during fermentations and their links to sensory properties. With the advancement of high-throughput tools based ‘omic methods, such as next-generation sequencing (NGS) technologies, it is now possible to study microbial communities and their functions without isolation by culturing. This developing field and its potential to wine community is reviewed in Chapter 1. The standardisation of methods remains challenging in the field. DNA extraction is a key step in capturing the microbial diversity in samples for generating NGS data, therefore, DNA extraction methods are evaluated in Chapter 2. In Chapter 3, machine learning is utilized in guiding raw data mining generated by the untargeted GC-MS analysis. This step is crucial in order to take full advantages of the large scope of data generated by ‘omic methods. These lay a solid foundation for Chapters 4 and 5 where microbial community structures and their outputs - chemical and sensory compositions are studied by using approaches and tools based on multiple ‘omics methods.
The results of this thesis show first that by using novel statistical approaches, it is possible to extract meaningful information from heterogeneous biological, chemical and sensorial data. Secondly, results suggest that the variation in wine aroma, might be related
to microbial interactions taking place not only inside a single community, but also the
IV
interactions between communities, such as vineyard and winery communities. Therefore, the true sensory expression of terroir might be masked by the interaction between two microbial communities, although more work is needed to uncover this potential relationship. Such potential interaction mechanisms were uncovered between non- Saccharomyces yeast and bacteria in this work and unexpected novel bacterial growth was observed during alcohol fermentation. This suggests new layers in understanding of wine fermentations. In the future, multi-omic approaches could be applied to identify biological pathways leading to specific wine aroma as well as investigate the effects upon specific winemaking conditions. These results are relevant not just for the wine industry, but also to other industries where complex microbial networks are important. As such, the approaches presented in this thesis might find widely use in the food industry.