### Why Waffle when One Wordle Do?

A game of Wordle solved in 3 guesses (a birdie).

Hula hoops were all the rage in 1958. Yo-yos, popular before World War II, were relaunched in the 1960s. Rubik’s Cube, invented in 1974, quickly became a global craze. Sudoku, which had been around for years, was wildly popular when it started to appear in American and European newspapers in 2004.

The latest fad is Wordle, a clever word puzzle devised by Jake Wardle, the Welsh-born software engineer who developed the game. This web-based word game quickly went viral, and is a hot topic of discussion right now. The New York Times bought the rights to Wordle in January.

The challenge of Wordle is to guess a mystery five-letter word: a different word is chosen each day, and you are allowed six attempts to guess it. After each attempt, feedback gives you new information to help the search. Each guess must be a valid five-letter word, from an approved list. When you enter a guess, each of the five letters is coloured as follows:

• Grey: {The letter is not in the answer.}
• Yellow: {The letter is in the answer, but at another position.}
• Green: {The letter is in the answer and at the correct position.}

With three colours and five letters, there are in total ${3^5=243}$ possible colour patterns.

The objective is to determine the answer in the fewest possible tries. When you get a 2 or 3 it feels good: you have a sense of achievement. Someone described a 4 as par and a 3 as a birdie. On this basis, a 2 would be an eagle and a 1 an albatross. With essentially no information to start, an albatross would be something of a fluke. However, eagles are not so rare!

The current official list of accepted five-letter words, those that may be used as a guess, has ${N = 12,972}$ entries, and includes numerous very rare or strange words. However, the answer is chosen from a much shorter list, with common words that are well known to most players. This list has just 2,315 words. So, if you were allowed 2,315 guesses, you could always solve the puzzle. There are many familiar five-letter words that are not in this list.

The idea of limiting Wordle to one puzzle per day is inspired. It puts everyone in the same place at the stroke of midnight. Later — hopefully, much later — in the day, they can discuss the puzzle and compare notes. And it avoids excessive use of, or obsession with, the game.

Letter Frequency

The frequencies with which letters of the alphabet occur vary widely, from the most common, E, to the rarest, Z. Letter frequencies are important in cryptography and in several word puzzle games, including Wordle. There are many ways to measure letter frequency, using dictionaries, long novels, and enormous web-based data sets. One version of the alphabet in order of frequency is

$\displaystyle \mathbf{E\,T\,A\,O\,I\,N} \quad \mathbf{S\,H\,R\,D\,L\,U} \quad \mathbf{C\,M\,W\,F\,Y\,P} \quad \mathbf{V\,B\,G\,K\,J\,Q} \quad \mathbf{X\,Z} \,.$

However, the frequency distribution for the first letters of words differs significantly from this. Likewise for the final letter (S is quite common there). We could, in principle, analyse all five-letter words and derive a frequency distribution for each of the five positions. But the five top letters would probably not yield an allowable word. Also important for Wordle is relative word frequency. But we do not know how the solution word is chosen, or whether its level of use is a factor in the choice.

Cracking the Puzzle with Entropy

Upon encountering Wordle, computer wizards immediately think of how an algorithm could be devised to optimise the solution of the puzzle. Grant Sanderson, who writes the brilliant 3Blue1Brown blog, made a recent video on Wordle and presented an analysis based on the concept of entropy. He implemented his algorithm as a wordlebot, a program that plays the game automatically.

The probability of getting a particular colour pattern, let us say { Green | Grey | Yellow | Grey | Grey }, is the number n of words matching this pattern divided by the total number ${N}$ of accepted words:

$\displaystyle p( { Green | Grey | Yellow | Grey | Grey }) = n / N$

The smaller the value of ${p}$, the more information it contains! The most likely outcomes are the least informative. Our aim is to make a guess that has very low probability, thereby providing the most information. Each of the ${3^5 = 243}$ possible patterns has an associated probability. The pattern of five grey letters occurs with probability 0.1422 of about 14% of the time. Five greens occur with the tiny probability ${1/12{,}972 = 7.7\times 10^{-5}}$.

Imagine we are performing a binary search, such that each new “bit” of information reduces the number of possible outcomes by a factor of one half. We define the information content of this bit by ${I = -\log_2 p}$. Obviously, the information content of one bit is ${-\log_2 \frac{1}{2} = 1}$. The logarithm means that, if one guess gives us ${k}$ bits of information and a second gives ${\ell}$ bits, then the combination gives us ${k+\ell}$ bits. Multiplication of probabilities corresponds to addition of information.

Claude Shannon developed Information Theory in the 1940s, and defined the expected value of information by the formula

$\displaystyle E(I) = - \sum_{x}\ p(x) \log_2 p(x) \ \ \ \ \ (1)$

This is closely analogous to the definition of entropy in thermodynamics and statistical mechanics and, following a suggestion of John von Neumann, Shannon used the term entropy for his measure.

Suppose there are 16 possible answers, each being equally likely (so ${p=\frac{1}{16}}$ for each). Then, by (1), ${E(I) = 4}$, so we need 4 bits of information. If there are 1024 equally likely answers, then we need 10 bits. If the choice is over ${n}$ equally likely possibilities, then ${p=1/n}$ and the entropy reduces to ${\log_2 n}$. The smaller p, the more information we have!

Sanderson computed the information content for all possible first guesses, that is, for all 13,000 or so allowable words. He deduced the best second guess as that with maximum information, or that which would reduce the number of remaining possibilities to a minimum. He then repeated this process for subsequent guesses. The wordlebot that he developed maximises the entropy at each stage.

The computer can play millions of games, trying all allowable combinations of letters, in a fraction of a second. Using this, Sanderson obtained some curious and counter-intuitive results. The optimal (or almost optimal) first guess word was CRANE. Moreover, the system seemed to produce a second guess that ignored information in the feedback. For example, in one case, the best choice for the second word was SHTIK, even though the letter A in Crane was green (in the word and in the correct place). For a full account, see his video, referenced below.

I think the idea of using information theory to analyse Wordle  is very clever, but the average player is probably better to rely on intuition, and perhaps that is more fun than depending for help on an autobot. The current frenzy will abate and, in a few years, we may have forgotten the fun we had with Wordle. In the meantime, let’s enjoy ourselves.

And Finally …

Solving a Wordle puzzle is a mixture of skill and luck. Usually, if we pay careful attention to the “colourful feedback”, we can find the solution. However, there are cases where luck is definitely essential. Suppose the answer word is SHAVE. By good fortune, we may start with SHADE, getting four greens: only the fourth letter remains to be found. But there are five trials left, and six possibilities: SHAKE, SHALE, SHAME, SHAPE, SHARE and the correct SHAVE. There seems to be no way of guaranteeing the right choice.

It is hardly surprising that, in the current climate, the word-lists have been bowdlerized to “avoid offence”. To my profound disappointment, one of the five-letter words that is now considered unacceptable is “lynch” (see Wikipedia article). I really don’t believe that the Lynch Clan are such a thin-skinned lot.

Sources

${\bullet}$ 3Blue1Brown video: Solving Wordle using information theory.
${\bullet}$ Wikipedia article: Wordle: http://www.wikipedia.org/