### Twin Peaks Entropy

Next week there will be a post on tuning pianos using a method based on entropy. In preparation for that, we consider here how the entropy of a probability distribution function with twin peaks changes with the separation between the peaks.

Classical Entropy

Entropy was introduced in classical thermodynamics about 150 years ago and, somewhat later, statistical mechanics provided insight into the nature of this quantity. Further insight came with the use of entropy in communication theory. This allowed entropy to be understood as missing information.

Suppose a variable ${x}$ is described by the probability distribution function ${p(x)}$. For example, it might be ${p(x) = N(x,\mu,\sigma)}$, a normal distribution with mean ${\mu}$ and variance ${\sigma^2}$ We can think of ${\sigma}$ as a measure of our uncertainty, or lack of information. If ${\sigma}$ is very small, the value is almost certainly close to ${\mu}$. If ${\sigma}$ is large, there is less information about the value of ${x}$: for a given probability level, it occupies a larger interval.

However, ${\sigma}$ is not a good measure of missing information. Suppose the distribution is a combination of two normal distributions: $\displaystyle p(x) = {\textstyle{\frac{1}{2}}}[ N(x,\mu_1,\sigma_0) + N(x,\mu_2,\sigma_0) ]$ Now suppose ${\sigma_0}$ is small, so that the distribution has two sharp peaks. Then ${x}$ is probably close to either ${\mu_1}$ or ${\mu_2}$. However, the variance ${\sigma^2}$ becomes large, increasing as ${{\textstyle{\frac{1}{2}}}\Delta\mu^2}$ where ${\Delta\mu= \mu_1-\mu_2}$, even though the information content (that ${x}$ is close to ${\mu_1}$ or to ${\mu_2}$) is essentially the same.

Differential Entropy

We need a better measure of the missing information. It is provided by the differential entropy (also called the continuous entropy) $\displaystyle S = - \int p(x) \log p(x)\, dx .$

For the normal distribution, we can evaluate the integral, with the result that $\displaystyle S(\sigma) = \log\sqrt{2\pi e \sigma^2} \approx \log\sigma + 1.42$

So we see that ${S(\sigma)\rightarrow -\infty}$ as ${\sigma\rightarrow0}$ and ${S(\sigma)\rightarrow +\infty}$ as ${\sigma\rightarrow\infty}$.

Let us consider the average of twin peaks, one centered at the origin, the other at ${\mu}$, each with unit variance: $\displaystyle p(x) = {\textstyle{\frac{1}{2}}}[ N(x,0,1) + N(x,\mu,1) ]$

When the peaks coincide, ${\mu=0}$ and the entropy is ${S = S_0 = 1.42}$. When they are well separated, so that the peaks do not interact, we find that ${S \approx S_0 + \log 2 \approx 2.12}$. In general, the integral can be evaluated numerically. The result is shown here. Entropy as a function of separation between the peaks.

In conclusion, we see that a single Gaussian peak has lower entropy than two separated peaks. As the separation grows, the entropy tends to a limit whereas the variance grows without bound.

Next week’s article will apply these entropy ideas to the practical problem of piano tuning!