If we toss a `fair’ coin, one for which heads and tails are equally likely, a large number of times, we expect approximately equal numbers of heads and tails. But what is `approximate’ here? How large a deviation from equal values might raise suspicion that the coin is biased? Surely, 12 heads and 8 tails in 20 tosses would not raise any eyebrows; but 18 heads and 2 tails might.

We will consider the more general case where we do not know the odds for heads and tails. After all, no coin is perfect, so we cannot be sure that it is fair. Suppose we toss the coin times and get heads. We denote the unknown probability of heads by . We pose the following question:

- How many times do we need to toss a coin to get an accurate estimate of the odds of getting heads? How big does have to be?

**Conditional Probabilities**

The probability of two independent events and both occurring (denoted ) can be expressed in terms of conditional probabilities in two different ways:

- The probability of multiplied by the probability of given that holds.
- The probability of multiplied by the probability of given that holds.

In symbolic form, this is

Now let us be specific and consider the event A to be “the occurrence of heads in tosses” and to be “the probability of heads is ”. Then

Note that is discrete () whereas is continuous (), so the probability that lies in an interval is .

To answer the question posed above, we wish to estimate the first term in (1), that is, , the probability of a coin with odds of heads, given the experimental result of heads in tosses. The equation has four factors, so we need to know or to estimate the remaining three. Let us consider these terms in turn, from right to left:

- : The probability of odds in the absence of any further information or data. The probability is called the
*prior*estimate. - : The conditional probability of heads for a coin with given odds . This conditional probability is called the
*likelihood*. - : The probability of heads in tosses (without further qualification). can be partitioned into a sum or integral of mutually exclusive and exhaustive cases:

If we know the two factors on the right-hand side of (1) we can evaluate this integral.

- : The conditional probability of the odds being given that heads have shown up in an experiment. This is called the
*posterior*probability, and is what we seek.

We can now write an expression for the posterior probability:

This result is known as* Bayes’ Theorem*. It is often expressed in the form

There is a vast literature on Bayes’ Theorem, the many controversies that have surrounded it and its numerous applications. For an elementary account of this history, see McGrayne (2011).

**Estimating the Terms**

The prior probability depends on the information available before tossing the coin. In the absence of any* a priori* data, we may assume a uniform distribution . The conditional probability is given by the familiar binomial distribution

This comes from the chance of heads [factor ] and the chance of tails [factor ], in any order [factor or *n*-choose-*h*].

The integral is a standard beta function which can be expressed as a ratio of factorials:

We can now write the desired probability density function (2) in final form:

This looks like a binomial distribution but is fixed here and is the random variable. It is a beta distribution, conjugate to the binomial distribution.

The figure below shows the posterior probability for *h*=20 and *n*=50. It peaks at $*p*=*h/n*=0.4$, the mode; the mean is .

**A Limiting Case**

Before getting to the odds, we look briefly at a limiting case. Suppose we “know”* a priori* that the coin is fair (this is unrealistic but instructive). Then we must choose . The integral in the demominator of (2) is , so

that is, the *posterior* probability is identical to the *prior*. Since we are certain from the outset, no amount of additional data can sway our conviction. But this never happens: no coin, however carefully minted, is guaranteed to be completely fair. In reality, we should consider a prior peaking sharply at rather than a delta-function. New information can then result in a change in the expected value of .

**How Many Tosses?**

The question raised above was how many tosses are needed to estimate the odds. Of course, this depends on the level of precision required. The posterior distribution for given is the beta distribution

This is a standard beta distribution. The expected value of is

and its variance is

For large and we can write

The quantity is called the standard error.

We expect to be close to , but we must be more specific. It is common to choose a *confidence interval* with . For a normal distribution, this corresponds to 95% confidence: will be within this interval 95% of the time. We also specify a level of precision: let us require that differs from by at most . To ensure this, we need

Suppose the coin is approximately fair. Then so . If the confidence interval comprises values within two standard errors () and we require an accuracy of three significant figures () then

This is amazing: we need on the order of a million tosses to have confidence in the estimated value of to three significant figures.

If we ask for six-figure accuracy, we need a trillion tosses!

**Sources**

Sharon Bertsch McGrayne, 2011: *The Theory that would not Die*. Yale Univ. Press, 336pp.