Couple of more points [Under
Information theory > Shannon's paper > Capacity Theorem]
Original entropy definition for discrete probabilities
H = — ∑ pi log pi
extended for continuous probability distribution
H(x) = — ∫ p(x) log p(x) dx
Entropy is about symbols representations; number of bits required on an average to represent a symbol assuming the probabilities of symbols are known in advance. If information source produce symbols at constant rate, entropy can as well be represented in bits per unit time. If all symbols are equally probable, entropy in bits per unit time is going to be same as symbol rate (multiplied by a constant). This should hold true for both cases: discrete and continuous.
If there is no noise, entropy is same as that for information source. But when there is noise, things are not easy as probability distribution of noise is not known. We need to take particular case. Shannon took a case of "white thermal noise" which is "independent of source". White thermal noise has "Gaussian" probability distribution. So using two mathematical knowns
∫ p(x) dx = 1 and
σ2
= ∫ p(x) x2 dx
Shannon was able to show that entropy of noise is of the form log( √2πeσ2 ) bits per symbol.
Nyquist says that sampling frequency should be at least twice the highest frequency in the input signal. Taking this theorem and assuming that noise is limited by certain average power, entropy of noise is of the form W log(2πeN).
We know the noise entropy under an assumption of average noise power, but what exactly is noise entropy ? Noise entropy is the uncertainty that noise will generate certain symbol or certain bit representation. The receiver will see addition of these noise bits and the bits originally sent by transmitter (transmitter converts symbols generated by information source). We must note here that even though noise entropy is known, it still remain an uncertainty measure. Thus we must expect a limit to which we can nullify its effect at receiver side !! This is only one part; another part deal with the type of probability distribution the noise (symbols) has.
Shannon proves that for best reception, received signals should also have Gaussian probability distribution if noise has Gaussian probability distribution. This is not easy to understand if we do not know about Gaussian distribution. A constant probability distribution that we talked about above is a line parallel to x-axis when plotted. When Gaussian probability is plotted, it looks like:
Here the probabilities are not only uneven (few values are more probable than others) but also concentrated at top and below. Shannon says, to achieve least impact of noise (i.e. smallest possible error rate), we need to control transmission so that received signals would also form Gaussian probability distribution. To understand this, let us take an example wherein all input symbols have constant probability distribution. If we transfer the symbols as they are, we will have certain error rate (which of course would not be smallest possible). To improve error rate, we will control bit representation, say we add duplicate bits. If we do that, receiving side will see more and more similar patterns (in same unit time or observation window) giving us uneven distribution, rather than constant distribution of information source. Though the example of duplicate bits is taken here, the idea is that the probability distribution of received signals would become uneven and will have to get close to Gaussian distribution to achieve better and better rates.
References: A Mathematical Theory of Communication by Claude E. Shannon, An Introduction to Information Theory by John R. Pierce.
Copyright © Samir Amberkar 2010-11 | § |
Another angle « | Theory Index | » Implications |