### Information Theory

That’s Maths in The Irish Times this week (TM059, or Search for “thatsmaths” at irishtimes.com) is about data compression and its uses in modern technology.

Left: An equation form Shannon (1948), the paper that launched Information Theory.

The arrival of mobile phones was followed rapidly by “txtese”, an abbreviation of language to enable messages to be written and transmitted rapidly using SMS (Short Message Service). The simplest strategy using SMS is to omit most of the vowels; due to the redundancy of English, the meaning usually remains clear.

Of course, abbreviating spelling like this is nothing new. When telegram messages were charged by the word, all sorts of ingenious ways were found to save money by shortening them.

There is no standard SMS language and a wide variety of ploys are used, the context normally serving to remove ambiguities.

Information Theory

The study of data compression goes back to a paper A Mathematical Theory of Communication by Claude Shannon, a brilliant American mathematician and engineer.

Shannon, known as the “father of information theory”, made many crucial contributions to the development of computing. His seminal paper on communications is 55 pages long, replete with ground-breaking ideas [Ref below].

Shannon gave a mathematical definition of the quantity of information in a message. Called entropy, this allows the information content to be measured precisely. Shannon showed that there is a limit to the extent to which information can be compressed, related to what he called the entropy rate.

It is mathematically impossible to do better than this. However, if we permit some distortion or loss of information, much higher compression ratios are possible.

The redundancy of English language was estimated by Shannon to be about 50%. So, we should be able to compress a typical message to half its length. For example, “Hpy Xmas 2 all TM rdrs” is about half the length of its fully expanded version, yet it is quite comprehensible.

Hi-tech Uses

Data compression enables music files to be reduced drastically in size. Music on a compact disc is uncompressed. An average song lasting three minutes requires about 32 megabytes of storage.

The MP3 format allows the size, and also the download time, to be reduced by a factor of about ten. It is a “lossy” compression, and does not sound identical to the original. Audiophiles decry the reduction in fidelity and stick with CDs or vinyl, but most of us don’t notice any degradation in quality.

MP3 has had a huge impact on how people acquire and save music. Thanks to data compression, it is possible to download and store a large volume of music on a PC or iPod. The compression is based on “perceptual noise shaping”.

For example, if two sounds are played simultaneously, we tend to hear the louder one, so the softer one may be omitted without much harm. And data compression is also vital for efficient storage and transmission of images, using formats such as JPEG.

Shannon’s work provides the mathematical underpinning for data storage and compression. Zip files, MP3s and JPEG images are made possible thanks to it. So, whether you are texting your friends, watching videos on the web or enjoying music on the move, you are benefiting from the application of Shannon’s mathematical theory of information.

Source:
C. E. Shannon, 1948: A mathematical theory of communication. Bell System Technical Journal, Vol. 27, pp. 379-423 and 623-656 (July and October, 1948). PDF available here.

*        *        *        *        *        *

Looking for the ideal Christmas present? Look no further:

Peter Lynch’s book about walking around the coastal counties of Ireland is now available as an ebook (at a very low price!). For more information and photographs go to http://www.ramblingroundireland.com/