Questions arising from Statistical Decision Theory, Bayes Methods and other probability theoretic fields lead to concepts of orthogonality of a family of probability measures. In this paper we therefore give a sketch of a generalized information theory which is very helpful in considering and answering those questions. In this adapted information theory Shannon's classical transition channels modelled by finite stochastic matrices are replaced by compact families of probability measures that are uniformly integrable. These channels are characterized by concepts such as information rate and capacity and by optimal priors and the optimal mixture distribution. For practical studies we introduce an algorithm to calculate the capacity of the whole probability family which is appli cable even for general output space. We then explain how the algorithm works and compare its numerical costs with those of the classical Arimoto-Blahut-algorithm.
It is of basic interest to assess the quality of the decisions of a statistician, based on the outcoming data of a statistical experiment, in the context of a given model class P of probability distributions. The statistician picks a particular distribution P , suffering a loss by not picking the 'true' distribution P' . There are several relevant loss functions, one being based on the the relative entropy function or Kullback Leibler information distance. In this paper we prove a general 'minimax risk equals maximin (Bayes) risk' theorem for the Kullback Leibler loss under the hypothesis of a dominated and compact family of distributions over a Polish observation space with suitably integrable densities. We also find that there is always an optimal Bayes strategy (i.e. a suitable prior) achieving the minimax value. Further, we see that every such minimax optimal strategy leads to the same distribution P in the convex closure of the model class. Finally, we give some examples to illustrate the results and to indicate, how the minimax result reflects in the structure of least favorable priors. This paper is mainly based on parts of this author's doctorial thesis.
Let (Epsilon_k) be a sequence of experiments with the same finite parameter set. Suppose only that identification of the parameter is possible asymptotically. For large classes of information functionals we show that their exponential rates of convergence towards complete information coincide. As a special case we obtain the rate of the Shannon capacity of product experiments.
In this note, answering a question of N. Maslova, we give a two-dimensional elementary example of the phenomenon indicated in the title. Perhaps this simple example may serve as an object of comparison for more refined models like in the theory of kinetic differential equations where similar questions still seem to be unsettled.
The observation of an ergodic Markov chain asymptotically allows perfect identification of the transition matrix. In this paper we determine the rate of the information contained in the first n observations, provided the unknown transition matrix belongs to a known finite set. As an essential tool we prove new refinements of the large deviation theory of the empirical pair measure of finite Markov chains. Keywords: Markov Chain, Entropy, Bayes risk, Large Deviations.
In 1979, J.M. Bernardo argued heuristically that in the case of regular product experiments his information theoretic reference prior is equal to Jeffreys' prior. In this context, B.S. Clarke and A.R. Barron showed in 1994, that in the same class of experiments Jeffreys' prior is asymptotically optimal in the sense of Shannon, or, in Bayesian terms, Jeffreys' prior is asymptotically least favorable under Kullback Leibler risk. In the present paper, we prove, based on Clarke and Barron's results, that every sequence of Shannon optimal priors on a sequence of regular iid product experiments converges weakly to Jeffreys' prior. This means that for increasing sample size Kullback Leibler least favorable priors tend to Jeffreys' prior.
The paper studies differential and related properties of functions of a real variable with values in the space of signed measures. In particular the connections between different definitions of differentiability are described corresponding to different topologies on the measures. Some conditions are given for the equivalence of the measures in the range of such a function. These conditions are in terms of socalled logarithmic derivatives and yield a generalization of the Cameron-Martin-Maruyama-Girsanov formula. Questions of this kind appear both in the theory of differentiable measures on infinite-dimensional spaces and in the theory of statistical experiments.
Some formulae, containing logarithmic derivatives of (smooth) measures on infinitedimensional spaces, arise in quite different situations. In particular, logarithmic derivatives of a measure are inserted in the Schr"odinger equastion in the space consisting of functions that are square integrable with respect to this measure, what allows us to describe very simply a procedure of (canonical) quantization of infinite-dimensional Hamiltonian systems with the linear phase space. Further, the problem of reconstructing of a measure by its logarithmic derivative (that was posed in  independently of any applications) can be equivalent either to the problem of finding the "ground state" (considered as some measure) for infinite-dimensional Schr"odinger equation, or to the problem of finding an invariant measure for a stochastic differential equation (that is a central question of so-called stochastic quantization), or to the problem of recenstruc ting "Gibbsian measure by its specification" (i.e. by a collection of finite-dimensional conditional distributions). Logarithmic derivatives of some measure appear in Cameron-Martin-Girsanov-Maruyama formulae and in its generalizations related to arbitrary smooth measures; they allow also to connect these formulae and the Feynman-Kac formulae. This note discusses all these topics. Of course due to its shortness the presentation is formal in main, and precise analitical assumptions are usually absent. Actually only a list of formulae with small comments is given. Let us mention also that we do not consider at all so-called Dirichlet forms to which a great deal of literature is devoted (cf.  and references therein to the works of S. Alberion and others).
We compare different notions of differentiability of a measure along a vector field on a locally convex space. We consider in the L2-space of a differ entiable measure the analoga of the classical concepts of gradient, divergence and Laplacian (which coincides with the OrnsteinUhlenbeck operator in the Gaussian case). We use these operators for the extension of the basic results of Malliavin and Stroock on the smoothness of finite dimensional image measures under certain nonsmooth mappings to the case of non-Gaussian measures. The proof of this extension is quite direct and does not use any Chaos-decomposition. Finally, the role of this Laplacian in the procedure of quantization of anharmonic oscillators is discussed.
Starting from the uniqueness question for mixtures of distributions this review centers around the question under which formally weaker assumptions one can prove the existence of SPLIFs, in other words perfect statistics and tests. We mention a couple of positive and negative results which complement the basic contribution of David Blackwell in 1980. Typically the answers depend on the choice of the set theoretic axioms and on the particular concepts of measurability.