Doctoral Thesis
Refine
Document Type
- Doctoral Thesis (3) (remove)
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- Datenanalyse (3) (remove)
Faculty / Organisational entity
One of the main tasks of molecular biology is understanding the mechanisms of molecular biological processes. This brings the problem of creating regulatory networks and therefore finding key regulators. In order to do it, it is important to have such representation of the data that can reveal the distinct patterns within the big groups. On one side, there are numerous experimentally determined kinetic information about the alteration of molecular presence in the observed system. On the other side, there are documented throughout the years evidences of the involvement of molecules in different biological processes. Both sources of the information have their drawbacks: experimental data reflect only a fleeting molecular state of each individual organism and therefore are often high-variant and noisy; functional groups were determined as generalization of known roles of molecules in biological processes and therefore can be not complete and only partially relevant to certain experimental conditions and individual organisms. Our goal is to get the overview of the experimentally observed molecules and extract the knowledge from both sources, avoiding constrains of noise distractions and generalization bias. The resulted optimal representation of the experimental data then would help to pinpoint potential regulators.
The proposed method is called the Signature Topology (ST) approach, as it uses the functional topology as the prior knowledge source and creates a specific signature for the given experimental data. The ST approach is based on knowledge-and-data-driven machine learning algorithm, that is implemented via a dynamic programming approach. Based on both prior knowledge and learning from the data, the proposed approach represents a combination of supervised and unsupervised machine learning. The resulting network structure deals with data abundance and avoids an over-detailed description that may lead to misinterpretation and is able to pick out elements with minor behavior patterns.
The method is tested with artificial data and applied to real-world mass-spectrometry proteome data and NGS-transcriptome data of Chlamydomonas reinhardtii. The proposed approach helps with identification of the potential regulatory genes, whose roles are not explicitly provided in the used functional ontology. Moreover, it shows a successful reduction in data complexity while preserving all individual molecular information reported in the literature and stored in the functional ontology. If the proposed approach analyzes different experimental data with the same ontology, the resulting networks are uniform and therefore can be compared. That gives an opportunity to compare between a great variety of experimental conditions, from different organisms to different
system levels.
Knowledge discovery from large and complex collections of today’s scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research covered in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics. Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of highdimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of highenergy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.
Adaptive Extraction and Representation of Geometric Structures from Unorganized 3D Point Sets
(2009)
The primary emphasis of this thesis concerns the extraction and representation of intrinsic properties of three-dimensional (3D) unorganized point clouds. The points establishing a point cloud as it mainly emerges from LiDaR (Light Detection and Ranging) scan devices or by reconstruction from two-dimensional (2D) image series represent discrete samples of real world objects. Depending on the type of scenery the data is generated from the resulting point cloud may exhibit a variety of different structures. Especially, in the case of environmental LiDaR scans the complexity of the corresponding point clouds is relatively high. Hence, finding new techniques allowing the efficient extraction and representation of the underlying structural entities becomes an important research issue of recent interest. This thesis introduces new methods regarding the extraction and visualization of structural features like surfaces and curves (e.g. ridge-lines, creases) from 3D (environmental) point clouds. One main part concerns the extraction of curve-like features from environmental point data sets. It provides a new method supporting a stable feature extraction by incorporating a probability-based point classification scheme that characterizes individual points regarding their affiliation to surface-, curve- and volume-like structures. Another part is concerned with the surface reconstruction from (environmental) point clouds exhibiting objects that are more or less complex. A new method providing multi-resolutional surface representations from regular point clouds is discussed. Following the applied principles of this approach a volumetric surface reconstruction method based on the proposed classification scheme is introduced. It allows the reconstruction of surfaces from highly unstructured and noisy point data sets. Furthermore, contributions in the field of reconstructing 3D point clouds from 2D image series are provided. In addition, a discussion concerning the most important properties of (environmental) point clouds with respect to feature extraction is presented.