Many discrepancy principles are known for choosing the parameter \(\alpha\) in the regularized operator equation \((T^*T+ \alpha I)x_\alpha^\delta = T^*y^\delta\), \(||y-y^d||\leq \delta\), in order to approximate the minimal norm least-squares solution of the operator equation \(Tx=y\). In this paper we consider a class of discrepancy principles for choosing the regularization parameter when \(T^*T\) and \(T^*y^\delta\) are approximated by \(A_n\) and \(z_n^\delta\) respectively with \(A_n\) not necessarily self - adjoint. Thisprocedure generalizes the work of Engl and Neubauer (1985),and particular cases of the results are applicable to the regularized projection method as well as to a degenerate kernel method considered by Groetsch (1990).
On a family F of probability measures on a measure space we consider the Hellinger and Kullback-Leibler distances. We show that under suitable regulari ty conditions Jeffreys' prior is proportional to the k-dimensional Hausdorff measure w.r.t. Hellinger dis tance respectively to the k2 -dimensional Hausdorff measure w.r.t. Kullback-Leibler distance. The proof i s based on an area-formula for the Hausdorff measure w.r.t. to generalized distances.
A compact subset E of the complex plane is called removable if all bounded analytic functions on its complement are constant or, equivalently, i f its analytic capacity vanishes. The problem of finding a geometric characterization of the removable sets is more than a hundred years old and still not comp letely solved.
Questions arising from Statistical Decision Theory, Bayes Methods and other probability theoretic fields lead to concepts of orthogonality of a family of probability measures. In this paper we therefore give a sketch of a generalized information theory which is very helpful in considering and answering those questions. In this adapted information theory Shannon's classical transition channels modelled by finite stochastic matrices are replaced by compact families of probability measures that are uniformly integrable. These channels are characterized by concepts such as information rate and capacity and by optimal priors and the optimal mixture distribution. For practical studies we introduce an algorithm to calculate the capacity of the whole probability family which is appli cable even for general output space. We then explain how the algorithm works and compare its numerical costs with those of the classical Arimoto-Blahut-algorithm.
It is of basic interest to assess the quality of the decisions of a statistician, based on the outcoming data of a statistical experiment, in the context of a given model class P of probability distributions. The statistician picks a particular distribution P , suffering a loss by not picking the 'true' distribution P' . There are several relevant loss functions, one being based on the the relative entropy function or Kullback Leibler information distance. In this paper we prove a general 'minimax risk equals maximin (Bayes) risk' theorem for the Kullback Leibler loss under the hypothesis of a dominated and compact family of distributions over a Polish observation space with suitably integrable densities. We also find that there is always an optimal Bayes strategy (i.e. a suitable prior) achieving the minimax value. Further, we see that every such minimax optimal strategy leads to the same distribution P in the convex closure of the model class. Finally, we give some examples to illustrate the results and to indicate, how the minimax result reflects in the structure of least favorable priors. This paper is mainly based on parts of this author's doctorial thesis.
Let (Epsilon_k) be a sequence of experiments with the same finite parameter set. Suppose only that identification of the parameter is possible asymptotically. For large classes of information functionals we show that their exponential rates of convergence towards complete information coincide. As a special case we obtain the rate of the Shannon capacity of product experiments.
In this note, answering a question of N. Maslova, we give a two-dimensional elementary example of the phenomenon indicated in the title. Perhaps this simple example may serve as an object of comparison for more refined models like in the theory of kinetic differential equations where similar questions still seem to be unsettled.
The observation of an ergodic Markov chain asymptotically allows perfect identification of the transition matrix. In this paper we determine the rate of the information contained in the first n observations, provided the unknown transition matrix belongs to a known finite set. As an essential tool we prove new refinements of the large deviation theory of the empirical pair measure of finite Markov chains. Keywords: Markov Chain, Entropy, Bayes risk, Large Deviations.
In 1979, J.M. Bernardo argued heuristically that in the case of regular product experiments his information theoretic reference prior is equal to Jeffreys' prior. In this context, B.S. Clarke and A.R. Barron showed in 1994, that in the same class of experiments Jeffreys' prior is asymptotically optimal in the sense of Shannon, or, in Bayesian terms, Jeffreys' prior is asymptotically least favorable under Kullback Leibler risk. In the present paper, we prove, based on Clarke and Barron's results, that every sequence of Shannon optimal priors on a sequence of regular iid product experiments converges weakly to Jeffreys' prior. This means that for increasing sample size Kullback Leibler least favorable priors tend to Jeffreys' prior.