Knowledge discovery from large and complex collections of today’s scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research covered in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics. Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of highdimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of highenergy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.
In urban planning, sophisticated simulation models are key tools to estimate future population growth for measuring the impact of planning decisions on urban developments and the environment. Simulated population projections usually result in large, macro-scale, multivariate geospatial data sets. Millions of records have to be processed, stored, and visualized to help planners explore and analyze complex population patterns. We introduce a database driven framework for visualizing geospatial multidimensional simulation data based on the output from UrbanSim, a software for the analysis and planning of urban developments. The designed framework is extendable and aims at integrating empirical-stochastic methods and urban simulation models with techniques developed for information visualization and cartography. First, we develop an empirical model for the estimation of residential building types based on demographic household characteristics. The predicted dwelling type information is important for the analysis of future material use, carbon footprint calculations, and for visualizing simultaneously the results of land usage, density, and other significant parameters in 3D space. Our model uses multinomial logistic regression to derive building types at different scales. The estimated regression coefficients are applied to UrbanSim output in order to predict residential building types. The simulation results and the estimated building types are managed in an object-relational geodatabase. From the database, density, building types, and significant demographic variables are visually encoded as scalable, georeferenced 3D geometries and displayed on top of aerial photographs in a Google Earth visual synthesis. The geodatabase can be accessed and the visualization parameters can be chosen through a web-based user interface. The geometries are encoded in KML, Google's markup language, as ready-to-visualize data sets. The goal is to enhance human cognition by displaying abstract representations of multidimensional data sets in a realistic context and thus to support decision making in planning processes.