How many numbers begin with a 1? More than 30%!

The irregular distribution of the first digits of numbers in data-bases provides a valuable tool for fraud detection. A remarkable rule that applies to many datasets was accidentally discovered by an American physicist, Frank Benford, who described his discovery in a 1938 paper, “The Law of Anomalous Numbers” [TM181 or search for “thatsmaths” at irishtimes.com].

Benford-Distribution-3

With nine possible choices, we should expect each digit to occur on average one time in nine. In fact, the digits are heavily skewed toward smaller values: the first digit is more likely to be 1 than 2, more likely to be 2 than 3, and so on. The number 9 occurs as the first digit less frequently than any other number. Benford’s Law has unexpected practical benefits: it has led to surprising applications in forensic accounting and elsewhere.

The First-digit Law

Imagine a large list of numbers, such as might be found in census returns or stock market reports. Suppose a number is picked at random from the list. We might expect that the first digit, which may be anything from 1 to 9, is equally likely to be any of these values. However, it is frequently found that the number 1 appears as the first digit about 30% of the time, whereas 9 occurs less than 5% of the time.

Benford tested his idea on numerous datasets: lengths of rivers, population sizes, street addresses, death rates and so on. He found the same curious pattern in all these cases. The law applies most accurately to data that span several orders of magnitude. For datasets where all the entries are similar, this is not the case. For example, the heights of most adults are in the range from 150 to 190 centimetres, all beginning with 1. Changing units to inches, the range runs from 60 to 75, so only 6 and 7 appear as leading digits. However, many real-world distributions have wider ranges and satisfy Benford’s law with remarkable accuracy.

Fraud Detection

Benford’s Law can reveal highly unlikely frequencies of numbers in a dataset. It has been used to detect fraud in elections and tampering with digital images. Swindlers who fabricate figures tend to distribute their digits uniformly. Others, who choose amounts just below checking thresholds, leave tell-tale signals that show up as anomalies using the law.

Benford was not the first person to notice the peculiar distribution of leading digits. In 1881, the American astronomer Simon Newcomb, thumbing through a table of logarithms, noticed that the earlier pages, where the numbers start with 1, were more heavily worn than the later pages. Newcomb proposed that the probability for leading digits followed a logarithmic law. However, his finding was forgotten until Benford re-discovered it some sixty years later.

Logarithmic Law

Numerous websites, papers and textbooks present simple proofs of Benford’s Law, but fallacies in their arguments are common. The eminent logician and mathematician C.S. Peirce once observed that ‘‘in no other branch of mathematics is it so easy for experts to blunder as in probability theory’’. Although many aspects of Benford’s Law have solid theoretical foundations, there is still no unified approach that accounts for its ubiquity. Most experts agree that some mystery remains about the widespread occurrence of the law in real-life circumstances.

Benford-Distribution-3

According to Benford’s Law, the probability that the first digit of a randomly chosen number in a dataset obeying the law is d is given by:

Prob( d ) = Log10 ([ ( d + 1 ) / d ] for d = 1, 2, 3, … , 9 .

The figure above shows the Benford distribution, with Prob( 1 ) 0.31 and Prob( 9 ) 0.045 .


Last 50 Posts

Categories