Telecom Tutorials by Samir Amberkar: Entropy definition

online LTE test
online C test

Updated or New

GPRS RAN refresh notes ^New
GSM RAN refresh notes ^New

Entropy definition

Entropy definition [Under Information theory > Shannon's paper]

In 1st article, we said "for improving English language transmission rate, we can assign smaller number of bits representation to alphabet which is more probable than the alphabet which has lower probability". The idea here can further be extended to words; by mapping words - rather than alphabet - will give us still better transmission rate. For example, "The", "is" are most found and so can be given smaller number of bits representation. In alphabet mapping, we would have needed more than 3 bits for representing "The"; in word mapping, we may use lesser than 3 bits.

Can we do further improvment ? Yes ! application of English's grammer rules may give us better rates !!

So it seems for improving information rate, we need better and better knowledge or study of English language.

The idea we are trying to get here is that there are ways to improve transmission rate provided we invent better and better representation of information (symbols). But, of course, there will be limit to achievable transmission rate. Can this be theoretically calculated ? Is there any measure for information rate ?

Answer is Entropy.

Say information source is producing symbols at constant rate with every symbol having a certain probability (independent of previous symbol or symbols). We map every symbol with certain bit representation based on the probabilities and start transmitting the same. At receiving end, we decode these representations and update two counters - one for number of symbols and other one for number of bits. Entropy is calculated as average number of bits per symbol by using above two counters taken after sufficiently long duration.

For example,

A) say there are 32 symbols and each symbol is equally probable. Number of bits per symbol would be log₂32. So entropy would be 5 bits per symbol.
B) Now let us take as different case, say out of 32 symbols, a particular symbol S_x has probability of 50% and rest have equal probability. Since S_x has highest probability, it makes sense to assign only a bit to S_x. Due to higher probability, occurence of S_x would be more, making information average not 5 bits per symbol but rather smaller than 5 bits per symbol.

Shannon defines Entropy as (theorem 2, page 11):

H = — K ∑ p_i log p_i
K is positive constant, p_i is probablity of i th symbol, and summation done for all possible symbols.

For above example (taking K as unit value):

A) 32 symbols with equal probability (1/32 each)

H = — 32 (1/32) log₂( 1/32 ) = 5 bits per symbol.

B) 32 symbols, S_xwith probability of 0.5 and rest with (0.5/31) each

H = — ( 0.5 log₂( 0.5) + 31(0.5/31) log₂( 0.5/31) ) = 3.49 bits per symbol

As expected, entropy for case B is lower than entropy for case A.

In this article, we defined Entropy as a measure of best representation that can be achieved. But what exactly is the significance of Entropy ? In next article, we will look at the same.

References: A Mathematical Theory of Communication by Claude E. Shannon, An Introduction to Information Theory by John R. Pierce.

Capacity theorem «

Theory Index

» Uncertainty and Entropy