## 62-XX STATISTICS

### Refine

#### Document Type

- Doctoral Thesis (7)
- Article (1)

#### Keywords

- Autoregressive Hilbertian model (1)
- Bootstrap (1)
- Censoring (1)
- Change Point Analysis (1)
- Change Point Test (1)
- Change-point Analysis (1)
- Change-point estimator (1)
- Change-point test (1)
- Functional autoregression (1)
- Functional time series (1)

#### Faculty / Organisational entity

Multiphase materials combine properties of several materials, which makes them interesting for high-performing components. This thesis considers a certain set of multiphase materials, namely silicon-carbide (SiC) particle-reinforced aluminium (Al) metal matrix composites and their modelling based on stochastic geometry models.
Stochastic modelling can be used for the generation of virtual material samples: Once we have fitted a model to the material statistics, we can obtain independent three-dimensional “samples” of the material under investigation without the need of any actual imaging. Additionally, by changing the model parameters, we can easily simulate a new material composition.
The materials under investigation have a rather complicated microstructure, as the system of SiC particles has many degrees of freedom: Size, shape, orientation and spatial distribution. Based on FIB-SEM images, that yield three-dimensional image data, we extract the SiC particle structure using methods of image analysis. Then we model the SiC particles by anisotropically rescaled cells of a random Laguerre tessellation that was fitted to the shapes of isotropically rescaled particles. We fit a log-normal distribution for the volume distribution of the SiC particles. Additionally, we propose models for the Al grain structure and the Aluminium-Copper (\({Al}_2{Cu}\)) precipitations occurring on the grain boundaries and on SiC-Al phase boundaries.
Finally, we show how we can estimate the parameters of the volume-distribution based on two-dimensional SEM images. This estimation is applied to two samples with different mean SiC particle diameters and to a random section through the model. The stereological estimations are within acceptable agreement with the parameters estimated from three-dimensional image data
as well as with the parameters of the model.

A popular model for the locations of fibres or grains in composite materials
is the inhomogeneous Poisson process in dimension 3. Its local intensity function
may be estimated non-parametrically by local smoothing, e.g. by kernel
estimates. They crucially depend on the choice of bandwidths as tuning parameters
controlling the smoothness of the resulting function estimate. In this
thesis, we propose a fast algorithm for learning suitable global and local bandwidths
from the data. It is well-known, that intensity estimation is closely
related to probability density estimation. As a by-product of our study, we
show that the difference is asymptotically negligible regarding the choice of
good bandwidths, and, hence, we focus on density estimation.
There are quite a number of data-driven bandwidth selection methods for
kernel density estimates. cross-validation is a popular one and frequently proposed
to estimate the optimal bandwidth. However, if the sample size is very
large, it becomes computational expensive. In material science, in particular,
it is very common to have several thousand up to several million points.
Another type of bandwidth selection is a solve-the-equation plug-in approach
which involves replacing the unknown quantities in the asymptotically optimal
bandwidth formula by their estimates.
In this thesis, we develop such an iterative fast plug-in algorithm for estimating
the optimal global and local bandwidth for density and intensity estimation with a focus on 2- and 3-dimensional data. It is based on a detailed
asymptotics of the estimators of the intensity function and of its second
derivatives and integrals of second derivatives which appear in the formulae
for asymptotically optimal bandwidths. These asymptotics are utilised to determine
the exact number of iteration steps and some tuning parameters. For
both global and local case, fewer than 10 iterations suffice. Simulation studies
show that the estimated intensity by local bandwidth can better indicate
the variation of local intensity than that by global bandwidth. Finally, the
algorithm is applied to two real data sets from test bodies of fibre-reinforced
high-performance concrete, clearly showing some inhomogeneity of the fibre
intensity.

In this paper, we demonstrate the power of functional data models for a statistical analysis of stimulus-response experiments which is a quite natural way to look at this kind of data and which makes use of the full information available. In particular, we focus on the detection of a change in the mean of the response in a series of stimulus-response curves where we also take into account dependence in time.

The thesis studies change points in absolute time for censored survival data with some contributions to the more common analysis of change points with respect to survival time. We first introduce the notions and estimates of survival analysis, in particular the hazard function and censoring mechanisms. Then, we discuss change point models for survival data. In the literature, usually change points with respect to survival time are studied. Typical examples are piecewise constant and piecewise linear hazard functions. For that kind of models, we propose a new algorithm for numerical calculation of maximum likelihood estimates based on a cross entropy approach which in our simulations outperforms the common Nelder-Mead algorithm.
Our original motivation was the study of censored survival data (e.g., after diagnosis of breast cancer) over several decades. We wanted to investigate if the hazard functions differ between various time periods due, e.g., to progress in cancer treatment. This is a change point problem in the spirit of classical change point analysis. Horváth (1998) proposed a suitable change point test based on estimates of the cumulative hazard function. As an alternative, we propose similar tests based on nonparametric estimates of the hazard function. For one class of tests related to kernel probability density estimates, we develop fully the asymptotic theory for the change point tests. For the other class of estimates, which are versions of the Watson-Leadbetter estimate with censoring taken into account and which are related to the Nelson-Aalen estimate, we discuss some steps towards developing the full asymptotic theory. We close by applying the change point tests to simulated and real data, in particular to the breast cancer survival data from the SEER study.

In change-point analysis the point of interest is to decide if the observations follow one model
or if there is at least one time-point, where the model has changed. This results in two sub-
fields, the testing of a change and the estimation of the time of change. This thesis considers
both parts but with the restriction of testing and estimating for at most one change-point.
A well known example is based on independent observations having one change in the mean.
Based on the likelihood ratio test a test statistic with an asymptotic Gumbel distribution was
derived for this model. As it is a well-known fact that the corresponding convergence rate is
very slow, modifications of the test using a weight function were considered. Those tests have
a better performance. We focus on this class of test statistics.
The first part gives a detailed introduction to the techniques for analysing test statistics and
estimators. Therefore we consider the multivariate mean change model and focus on the effects
of the weight function. In the case of change-point estimators we can distinguish between
the assumption of a fixed size of change (fixed alternative) and the assumption that the size
of the change is converging to 0 (local alternative). Especially, the fixed case in rarely analysed
in the literature. We show how to come from the proof for the fixed alternative to the
proof of the local alternative. Finally, we give a simulation study for heavy tailed multivariate
observations.
The main part of this thesis focuses on two points. First, analysing test statistics and, secondly,
analysing the corresponding change-point estimators. In both cases, we first consider a
change in the mean for independent observations but relaxing the moment condition. Based on
a robust estimator for the mean, we derive a new type of change-point test having a randomized
weight function. Secondly, we analyse non-linear autoregressive models with unknown
regression function. Based on neural networks, test statistics and estimators are derived for
correctly specified as well as for misspecified situations. This part extends the literature as
we analyse test statistics and estimators not only based on the sample residuals. In both
sections, the section on tests and the one on the change-point estimator, we end with giving
regularity conditions on the model as well as the parameter estimator.
Finally, a simulation study for the case of the neural network based test and estimator is
given. We discuss the behaviour under correct and mis-specification and apply the neural
network based test and estimator on two data sets.

Functional data analysis is a branch of statistics that deals with observations \(X_1,..., X_n\) which are curves. We are interested in particular in time series of dependent curves and, specifically, consider the functional autoregressive process of order one (FAR(1)), which is defined as \(X_{n+1}=\Psi(X_{n})+\epsilon_{n+1}\) with independent innovations \(\epsilon_t\). Estimates \(\hat{\Psi}\) for the autoregressive operator \(\Psi\) have been investigated a lot during the last two decades, and their asymptotic properties are well understood. Particularly difficult and different from scalar- or vector-valued autoregressions are the weak convergence properties which also form the basis of the bootstrap theory.
Although the asymptotics for \(\hat{\Psi}{(X_{n})}\) are still tractable, they are only useful for large enough samples. In applications, however, frequently only small samples of data are available such that an alternative method for approximating the distribution of \(\hat{\Psi}{(X_{n})}\) is welcome. As a motivation, we discuss a real-data example where we investigate a changepoint detection problem for a stimulus response dataset obtained from the animal physiology group at the Technical University of Kaiserslautern.
To get an alternative for asymptotic approximations, we employ the naive or residual-based bootstrap procedure. In this thesis, we prove theoretically and show via simulations that the bootstrap provides asymptotically valid and practically useful approximations of the distributions of certain functions of the data. Such results may be used to calculate approximate confidence bands or critical bounds for tests.

It is well known that the structure at a microscopic point of view strongly influences the
macroscopic properties of materials. Moreover, the advancement in imaging technologies allows
to capture the complexity of the structures at always decreasing scales. Therefore, more
sophisticated image analysis techniques are needed.
This thesis provides tools to geometrically characterize different types of three-dimensional
structures with applications to industrial production and to materials science. Our goal is to
enhance methods that allow the extraction of geometric features from images and the automatic
processing of the information.
In particular, we investigate which characteristics are sufficient and necessary to infer
the desired information, such as particles classification for technical cleanliness and
fitting of stochastic models in materials science.
In the production line of automotive industry, dirt particles collect on the surface of mechanical
components. Residual dirt might reduce the performance and durability of assembled products.
Geometric characterization of these particles allows to identify their potential danger.
While the current standards are based on 2d microscopic images, we extend the characterization
to 3d.
In particular, we provide a collection of parameters that exhaustively describe size and shape
of three-dimensional objects and can be efficiently estimated from binary images. Furthermore,
we show that only a few features are sufficient to classify particles according to the standards
of technical cleanliness.
In the context of materials science, we consider two types of microstructures: fiber systems
and foams.
Stochastic geometry grants the fundamentals for versatile models able to encompass the
geometry observed in the samples. To allow automatic model fitting, we need rules stating which
parameters of the model yield the best-fitting characteristics. However, the validity of such
rules strongly depends on the properties of the structures and on the choice of the model.
For instance, isotropic orientation distribution yields the best theoretical results for Boolean
models and Poisson processes of cylinders with circular cross sections. Nevertheless, fiber
systems in composites are often anisotropic.
Starting from analytical results from the literature, we derive formulae for anisotropic
Poisson processes of cylinders with polygonal cross sections that can be directly used in
applications. We apply this procedure to a sample of medium density fiber board. Even
if image resolution does not allow to estimate reliably characteristics of the singles fibers,
we can fit Boolean models and Poisson cylinder processes. In particular, we show the complete
model fitting and validation procedure with cylinders with circular and squared cross sections.
Different problems arise when modeling cellular materials. Motivated by the physics of foams,
random Laguerre tessellations are a good choice to model the pore system of foams.
Considering tessellations generated by systems of non-overlapping spheres allows to control the
cell size distribution, but yields the loss of an analytical description of the model.
Nevertheless, automatic model fitting can still be obtained by approximating the characteristics
of the tessellation depending on the parameters of the model. We investigate how to improve
the choice of the model parameters. Angles between facets and between edges were never considered
so far. We show that the distributions of angles in Laguerre tessellations
depend on the model parameters. Thus, including the moments of the angles still allows automatic
model fitting. Moreover, we propose an algorithm to estimate angles from images of real foams.
We observe that angles are matched well in random Laguerre tessellations also when they are not
employed to choose the model parameters. Then, we concentrate on the edge length distribution. In
Laguerre tessellations occur many more short edges than in real foams. To deal with this problem,
we consider relaxed models. Relaxation refers to topological and structural modifications
of a tessellation in order to make it comply with Plateau's laws of mechanical equilibrium. We inspect
samples of different types of foams, closed and open cell foams, polymeric and metallic. By comparing
the geometric characteristics of the model and of the relaxed tessellations, we conclude that whether
the relaxation improves the edge length distribution strongly depends on the type of foam.

We discuss some first steps towards experimental design for neural network regression which, at present, is too complex to treat fully in general. We encounter two difficulties: the nonlinearity of the models together with the high parameter dimension on one hand, and the common misspecification of the models on the other hand.
Regarding the first problem, we restrict our consideration to neural networks with only one and two neurons in the hidden layer and a univariate input variable. We prove some results regarding locally D-optimal designs, and present a numerical study using the concept of maximin optimal designs.
In respect of the second problem, we have a look at the effects of misspecification on optimal experimental designs.