### Refine

#### Keywords

Questions arising from Statistical Decision Theory, Bayes Methods and other probability theoretic fields lead to concepts of orthogonality of a family of probability measures. In this paper we therefore give a sketch of a generalized information theory which is very helpful in considering and answering those questions. In this adapted information theory Shannon's classical transition channels modelled by finite stochastic matrices are replaced by compact families of probability measures that are uniformly integrable. These channels are characterized by concepts such as information rate and capacity and by optimal priors and the optimal mixture distribution. For practical studies we introduce an algorithm to calculate the capacity of the whole probability family which is appli cable even for general output space. We then explain how the algorithm works and compare its numerical costs with those of the classical Arimoto-Blahut-algorithm.

It is of basic interest to assess the quality of the decisions of a statistician, based on the outcoming data of a statistical experiment, in the context of a given model class P of probability distributions. The statistician picks a particular distribution P , suffering a loss by not picking the 'true' distribution P' . There are several relevant loss functions, one being based on the the relative entropy function or Kullback Leibler information distance. In this paper we prove a general 'minimax risk equals maximin (Bayes) risk' theorem for the Kullback Leibler loss under the hypothesis of a dominated and compact family of distributions over a Polish observation space with suitably integrable densities. We also find that there is always an optimal Bayes strategy (i.e. a suitable prior) achieving the minimax value. Further, we see that every such minimax optimal strategy leads to the same distribution P in the convex closure of the model class. Finally, we give some examples to illustrate the results and to indicate, how the minimax result reflects in the structure of least favorable priors. This paper is mainly based on parts of this author's doctorial thesis.

In 1979, J.M. Bernardo argued heuristically that in the case of regular product experiments his information theoretic reference prior is equal to Jeffreys' prior. In this context, B.S. Clarke and A.R. Barron showed in 1994, that in the same class of experiments Jeffreys' prior is asymptotically optimal in the sense of Shannon, or, in Bayesian terms, Jeffreys' prior is asymptotically least favorable under Kullback Leibler risk. In the present paper, we prove, based on Clarke and Barron's results, that every sequence of Shannon optimal priors on a sequence of regular iid product experiments converges weakly to Jeffreys' prior. This means that for increasing sample size Kullback Leibler least favorable priors tend to Jeffreys' prior.