Combined entropy [Under Information
theory > Shannon's paper]
In last article, we mentioned about combined entropy; let us spend little more time in understanding the concept of combined entropy. Say we have two information sources, I_{a} and I_{b}, each having its own set of symbols and bit representations or entropies, H(a) and H(b) respectively. What would be their combined entropy i.e. H(a,b) ?
Let us take examples.
Case I: Independent sources
Say I_{a} has two symbols S_{1a} and S_{2a }having same probability and I_{b} has three symbols S_{1b}, S_{2b ,} S_{3b }and_{ }S_{4b }again with same probabilities. A simple example I_{a} transmission would be S_{1a }S_{2a }S_{1a }S_{2a }S_{1a }... and that of I_{b} would be S_{1b }S_{2b }S_{3b }S_{4b }S_{1b }S_{2b }S_{3b }S_{4b }S_{1b }... If we talk about combining I_{a} and I_{b }transmissions, symbols about resultant transmission would be (S_{1a }S_{1b}), (S_{1a }S_{2b}), (S_{1a }S_{3b}), (S_{1a }S_{4b}), (S_{2a }S_{1b}), (S_{2a }S_{2b}), (S_{2a }S_{3b}), (S_{2a }S_{4b}) assuming I_{a} and I_{b} are independent. Example of combined transmission (let us call it I_{c}) would be (S_{1a }S_{1b}) (S_{1a }S_{2b}) (S_{1a }S_{3b}) (S_{1a }S_{4b}) (S_{2a }S_{1b}) (S_{2a }S_{2b}) (S_{2a }S_{3b}) (S_{2a }S_{4b}). Note that all these examples maintain the probabilities that we have assumed and also, probabilities of I_{a} and I_{b} symbols have been maintained in example transmission for I_{c}.
I_{a} would require 1 bit per symbol for representation, I_{b} would require 2 bits per symbol, and I_{c }would require 3 bits per symbol. Now if you look at above example, I_{c} requiring 3 bits per symbols makes sense as I_{c} symbols contain both I_{a} and I_{b} symbols. As I_{a} and I_{b} symbols are produced independently, average bits per symbol required for I_{c} has to be sum of average bits per symbol required for I_{a} and that required for I_{b}.
In other words, H(a,b) = H(a) + H(b).
Case II: Dependent sources
What if I_{b} is dependent on I_{a} ? Say I_{b} symbols S_{1b}& S_{2b} are produced only if symbol produced by I_{a} is S_{1a} - also I_{b} produces S_{3b }& S_{4b} only when I_{a} produce S_{2a}. Probable symbols for combined transmissions would be (S_{1a }S_{1b}), (S_{1a }S_{2b}), (S_{2a }S_{3b}), (S_{2a }S_{4b}). Entropies for I_{a} and I_{b} remain same as before, but number of bits per symbol required for I_{c} would be 2. This is because: for combined transmission, once I_{a} symbol is produced, we have only two choices from I_{b} symbols => two choices mean 1 bit per symbol => that makes total as 1 bit per symbol for I_{a} + 1 bit per symbol for I_{b} symbol = 2 bits per symbol. What we have used here is called conditional entropy (or equivocation).
In other words, H(a,b) = H(a) + H_{b}(a).
Above situation can be seen in other direction, say we know symbol produced by I_{b}, by knowing conditional probability, we can correctly guess the symbol produced by I_{a}. So we need only transmission of I_{b} symbols, making combined entropy as 2 bits per symbol. This match with our earlier value.
In the example, we have taken many-to-one dependency (number of I_{b} symbols dependent on one I_{a} symbol). There could be many-to-many dependency. Taking care of this, we can write:
H(a,b) = H(b) + H_{a}(b)
Thus H(a,b) = H(a) + H_{b}(a) = H(b) + H_{a}(b) = H(b,a).
Since entropy is about uncertainties, we can recheck above equations from uncertainty point of view.
When information sources are independent, it makes logical sense that uncertainty associated with combined transmissions will be additions of uncertainties associated with individual sources.
When sources are dependent, once symbol for particular source becomes known, uncertainty associated with connected (or dependent) source(s) decreases (lesser number of options to choose from) and vice versa. And so, for dependent sources, conditional probabilities apply !
Above equations seems to be doing just what we expected from thinking of entropy as measure of uncertainty.
References: A Mathematical Theory of Communication by Claude E. Shannon, An Introduction to Information Theory by John R. Pierce.
Copyright © Samir Amberkar 2010-11 | § |
Information rate with noise « | Theory Index | » Rate in terms of noise entropy |