Entropy definition [Under Information
theory > Shannon's paper]
In 1st article, we said "for improving English language transmission rate, we can assign smaller number of bits representation to alphabet which is more probable than the alphabet which has lower probability". The idea here can further be extended to words; by mapping words - rather than alphabet - will give us still better transmission rate. For example, "The", "is" are most found and so can be given smaller number of bits representation. In alphabet mapping, we would have needed more than 3 bits for representing "The"; in word mapping, we may use lesser than 3 bits.
Can we do further improvment ? Yes ! application of English's grammer rules may give us better rates !!
So it seems for improving information rate, we need better and better knowledge or study of English language.
The idea we are trying to get here is that there are ways to improve transmission rate provided we invent better and better representation of information (symbols). But, of course, there will be limit to achievable transmission rate. Can this be theoretically calculated ? Is there any measure for information rate ?
Answer is Entropy.
Say information source is producing symbols at constant rate with every symbol having a certain probability (independent of previous symbol or symbols). We map every symbol with certain bit representation based on the probabilities and start transmitting the same. At receiving end, we decode these representations and update two counters - one for number of symbols and other one for number of bits. Entropy is calculated as average number of bits per symbol by using above two counters taken after sufficiently long duration.
For example,
A) say there are 32 symbols and each symbol is equally
probable. Number of bits per symbol would be log_{2}32.
So entropy would be 5 bits per symbol.
B) Now let us
take as different case, say out of 32 symbols, a particular
symbol S_{x} has probability of 50% and rest have
equal probability. Since S_{x} has highest
probability, it makes sense to assign only a bit to S_{x}.
Due to higher probability, occurence of S_{x} would
be more, making information average not 5 bits per symbol but
rather smaller than 5 bits per symbol.
Shannon defines Entropy as (theorem 2, page 11):
H = — K ∑ p_{i} log p_{i}
K
is positive constant, p_{i} is probablity of i th
symbol, and summation done for all possible symbols.
For above example (taking K as unit value):
A) 32 symbols with equal probability (1/32 each)
H = — 32 (1/32) log_{2}( 1/32 ) = 5 bits per symbol.
B) 32 symbols, S_{x }with probability of 0.5 and rest with (0.5/31) each
H = — ( 0.5 log_{2}( 0.5) + 31(0.5/31) log_{2}( 0.5/31) ) = 3.49 bits per symbol
As expected, entropy for case B is lower than entropy for case A.
In this article, we defined Entropy as a measure of best representation that can be achieved. But what exactly is the significance of Entropy ? In next article, we will look at the same.
References: A Mathematical Theory of Communication by Claude E. Shannon, An Introduction to Information Theory by John R. Pierce.
Copyright © Samir Amberkar 2010-11 | § |
Capacity theorem « | Theory Index | » Uncertainty and Entropy |