From @infseriesbot, prove the identity: .

We have ,

so

and since

,

allora:

,

and since ,

From the series representation of the Stieltjes Gamma function, :

Skip to content
#
Month: September 2021

# A nice Limit

# Estimating the Bernouilli Parameter, a non-Bayesian Max Entropy Method

### Introduction and Result

### Comparison to Alternative Methods

### Derivations

### Application: What can we say about a specific doctor or center’s error rate based on *n* observations?

### Acknowledgments

From @infseriesbot, prove the identity: .

We have ,

so

and since

,

allora:

,

and since ,

From the series representation of the Stieltjes Gamma function, :

A maximum entropy alternative to Bayesian methods for the estimation of independent Bernouilli sums.

Let , where be a vector representing an *n* sample of independent Bernouilli distributed random variables . We are interested in the estimation of the probability *p*.

We propose that the probablity that provides the best statistical overview, (by reflecting the * maximum ignorance* point) is

, (1)

where and is the beta regularized function.

**EMPIRICAL**: The sample frequency corresponding to the “empirical” distribution , which clearly does not provide information for small samples.

**BAYESIAN**: The standard Bayesian approach is to start with, for prior, the parametrized Beta Distribution , which is not trivial: one is contrained by the fact that matching the mean and variance of the Beta distribution constrains the shape of the prior. Then it becomes convenient that the Beta, being a conjugate prior, updates into the same distribution with new parameters. Allora, with *n* samples and *m* realizations:

(2)

with mean . We will see below how a low variance beta has too much impact on the result.

Let be the CDF of the binomial . We are interested in the maximum entropy probability. First let us figure out the target value *q*.

To get the maximum entropy probability, we need to maximize . This is a very standard result: taking the first derivative w.r. to *q*, and since is concave to *q*, we get .

Now we must find *p* by inverting the CDF. Allora for the general case,

.

And note that as in the graph below (thanks to comments below by überstatistician Andrew Gelman), we can have a “confidence band” (sort of) with

;

in the graph below the band is for values of: .

**Case** (Real World): A thoraxic surgeon who does mostly cardiac and lung transplants (in addition to emergency bypass and aortic ruptures) operates in a business with around 5% perioperative mortality. So far in his new position in the U.S. he has done 60 surgeries with 0 mortality.

What can we reasonable say, statistically, about his error probability?

Note that there may be selection bias in his unit, which is no problem for our analysis: the probability we get is conditional on being selected to be operated on by that specific doctor in that specific unit.

Assuming independence, we are concerned with a binomially distributed r.v. where *n* is the number of trials and is the probability of failure per trial. Clearly, we have no idea what *p* and need to produce our best estimate conditional on, here, .

Here applying (1) with and , we have .

**Why is this preferable to a Bayesian approach when, say, n is moderately large?**

A Bayesian would start with a prior expectation of, say .05, and update based on information. But it is highly arbitrary. Since the mean is , we can eliminate one parameter. Let us say we start with and have no idea of the variance. As we can see in the graph below there are a lot of shapes to the possible distribution: it becomes all in the parametrization.

Thanks to Saar Wilf for useful discussions.