2 Entropy

2.1 Social Architecture - Ian Wright

Think of entropy as a number that measures the randomness of a distribution. The higher the entropy the more random the distribution.

The most random distribution of all is the uniform distribution. It’s the most random distribution because every x is equally likely.

There’s a constraint on entropy maximisation. That constraint is the conservation of energy.

We can calculate the distribution that corresponds to maximum entropy subject to a total energy constraint. The answer to this optimisation problem is the exponential distribution.

Ian Wright

2.2 Information Entropy - Shannon

In a single groundbreaking paper, he laid the foundation for the entire communication infrastructure underlying the modern information age.

The heart of his theory is a simple but very general model of communication: A transmitter encodes information into a signal, which is corrupted by noise and then decoded by the receiver. Despite its simplicity, Shannon’s model incorporates two key insights: isolating the information and noise sources from the communication system to be designed, and modeling both of these sources probabilistically. He imagined the information source generating one of many possible messages to communicate, each of which had a certain probability. The probabilistic noise added further randomness for the receiver to disentangle.

Before Shannon, the problem of communication was primarily viewed as a deterministic signal-reconstruction problem: how to transform a received signal, distorted by the physical medium, to reconstruct the original as accurately as possible. Shannon’s genius lay in his observation that the key to communication is uncertainty. This single observation shifted the communication problem from the physical to the abstract, allowing Shannon to model the uncertainty using probability. This came as a total shock to the communication engineers of the day.

First, Shannon came up with a formula for the minimum number of bits per second to represent the information, a number he called its entropy rate, H. This number quantifies the uncertainty involved in determining which message the source will generate. The lower the entropy rate, the less the uncertainty, and thus the easier it is to compress the message into something shorter.

For example, texting at the rate of 100 English letters per minute means sending one out of 26100 possible messages every minute, each represented by a sequence of 100 letters. One could encode all these possibilities into 470 bits, since 2470 ≈ 26100. If the sequences were equally likely, then Shannon’s formula would say that the entropy rate is indeed 470 bits per minute. In reality, some sequences are much more likely than others, and the entropy rate is much lower, allowing for greater compression.

Second, he provided a formula for the maximum number of bits per second that can be reliably communicated in the face of noise, which he called the system’s capacity, C. This is the maximum rate at which the receiver can resolve the message’s uncertainty, effectively making it the speed limit for communication.

Finally, he showed that reliable communication of the information from the source in the face of noise is possible if and only if H < C. Thus, information is like water: If the flow rate is less than the capacity of the pipe, then the stream gets through reliably.

His theorems led to some counterintuitive conclusions. Suppose you are talking in a very noisy place. What’s the best way of making sure your message gets through? Maybe repeating it many times? That’s certainly anyone’s first instinct in a loud restaurant, but it turns out that’s not very efficient. Sure, the more times you repeat yourself, the more reliable the communication is. But you’ve sacrificed speed for reliability. Shannon showed us we can do far better. Repeating a message is an example of using a code to transmit a message, and by using different and more sophisticated codes, one can communicate fast — all the way up to the speed limit, C — while maintaining any given degree of reliability.

Another unexpected conclusion stemming from Shannon’s theory is that whatever the nature of the information — be it a Shakespeare sonnet, a recording of Beethoven’s Fifth Symphony or a Kurosawa movie — it is always most efficient to encode it into bits before transmitting it. So in a radio system, for example, even though both the initial sound and the electromagnetic signal sent over the air are analog wave forms, Shannon’s theorems imply that it is optimal to first digitize the sound wave into bits, and then map those bits into the electromagnetic wave. This surprising result is a cornerstone of the modern digital information age, where the bit reigns supreme as the universal currency of information.

Shannon’s general theory of communication is so natural that it’s as if he discovered the universe’s laws of communication, rather than inventing them. His theory is as fundamental as the physical laws of nature. In that sense, he was a scientist.

Shannon invented new mathematics to describe the laws of communication. He introduced new ideas, like the entropy rate of a probabilistic model, which have been applied in far-ranging branches of mathematics such as ergodic theory, the study of long-term behavior of dynamical systems. In that sense, Shannon was a mathematician.

Shannon’s theory has now become the standard framework underlying all modern-day communication systems: optical, underwater, even interplanetary.

Shannon figured out the foundation for all this more than 70 years ago. How did he do it? By focusing relentlessly on the essential feature of a problem while ignoring all other aspects.

Shannon (Quanta Magazine)