Several researchers have observed that, in a wide variety of collections of numerical data, the leading — or most significant — decimal digits are not uniformly distributed, but conform to a logarithmic distribution. Of the nine possible values, occurs more than
of the time while
is found in less than
of cases (see Figure above). Specifically, the probability distribution is
A more complete form of the law gives the probabilities for the second and subsequent digits. A full discussion of Benford’s Law is given in Berger and Hill (2015).
We define the Benford sets for
as
The relative density of in the range
may be written
where . This oscillates between
and
as
increases, and does not approach a limit. In particular, the set
does not have a natural density. However, we can assign a probability of an arbitrary number being in
following ideas outlined in Diaconis and Skyrmes (2018) and, in greater detail, in Tenenbaum (1995) [Earlier post: How many numbers begin with a 1?]
Averaging Methods
Different sequences behave differently. The Fibonacci numbers conform to Benford’s Law: the relative frequency of the leading digit converges to
The density of the set of Fibonacci numbers that start with
is
The sequence of prime numbers does not follow Benford’s Law. For the sequence of natural numbers, the relative density oscillates, with
and
.
For a set , the density can be defined as
This is an instance of the Cesàro mean, assigning the weight to each of the first
terms.
There are several alternative ways to specify density. The harmonic density replaces uniform weights by the decreasing sequence
The numbers are known as the harmonic numbers. As is well known, the harmonic series diverges, so
. Diaconis and Skyrmes (2018) describe a generalisation of (2):
For , the function
converges to the Riemann zeta-function
.
In the Figure above, we show the relative frequency for the first digit of a number to be (blue curve) and
(red curve) for
varying from
to
. This illustrates that, for
, the frequency oscillates between limits of approximately
and
.
In the Figure below, we show the relative frequency for the first digit of a number being , where the logarithmic mean (2) is used. The indication is that the frequency oscillates with reducing amplitude and tends to a limit of approximately 0.3, consistent with Benford’s Law.
The Logarithmic Distribution
We saw that the frequency of occurrence of as the leading digit follows a logarithmic law. But where does this come from? If we assume that all numbers
in the range
may occur with equal probability, then the uniform distribution
is appropriate. This leads to the conclusion that all decimal digits should occur with equal probability (since zero cannot be a leading digit). However, we could argue that smaller numbers are more probable than larger ones and assign another distribution, such as the logarithmic distribution. We recall that the harmonic numbers are asymptotic to the logarithmic function
. Thus, to a good approximation, the probability that a randomly chosen number is in the range
is
.
Now consider a `decade’ of numbers . The probability that a random choice within this interval is
, while numbers with leading digit
(in the interval
) occur with probability
. Thus, the relative frequency of numbers with leading digit
is
or about . This is the special case of Benford’s Law for
. The remaining cases may be demonstrated in a similar manner.
Sources
Berger, Arno and Theodore P. Hill, 2015: An Introduction to Benford’s Law. Princeton Univ. Press, 248pp. ISBN: 978-0-691-16306-2.
Diaconis, Persi and Brian Skyrms, 2018: Ten Great Ideas About Chance. Princeton Univ. Press, 255 pages [See Chapter 5].
Tenenbaum, Gérald, 1995: Introduction to Analytic and Probabilistic Number Theory. Cambridge University Press. ISBN 0-521-41261-7.
Thatsmaths: How many numbers begin with a 1?