SOLUTION TO EXAM NO. 2

Problem No. 1: Entropy Encoding

The alphabet A = {0, 1, 2, 3, 4} has a pdf p(A) = {0.4, 0.3, 0.2, 0.05, 0.05}

(a) Compute the average codeword length for an optimal code.

We proved in class that Huffman code is an optimal code.


So we have 
Table 1: Huffman Code for Alphabet A with pdf p(A)
A	p(A)	Code
0	0.4	0
1	0.3	11
2	0.2	101
3	0.05	1000
4	0.05	1001


(b) Is there a non-prefix code that has a lower expected codeword length than the most optimal prefix code? If so give an example.
Yes, try this:
Table 2: A Non-uniquely Decodable Code for Alphabet A with pdf p(A)
A	p(A)	Code
0	0.4	0
1	0.3	1
2	0.2	01
3	0.05	10
4	0.05	11


(c) Suppose sending a codeword through this channel costs an amount of money nonlinearly related to the codeword length: . For example, the cost of a codeword of length 2 is $4, and the cost of a codeword of length 3 is $8. Design a code that delivers a good compression rate, yet minimize the overall cost.

To get better compression ratio, we need lower bit rate. To minimize cost, we also need average length to be low. From (b), we can see that Huffman code is not minimum. 

If we were to use the Huffman code from (a), we have this:


If we were to use the non-uniquely decodable code from (b), we have this:


So do you want to spend $14.4 or $3.8?

Problem No. 2: Binary Symmetric Channels (BSC) and Capacity

For the BSC with transition probabilities p(0/1) = p(1/0) = p

(a) Find the value of p that minimizes the capacity of the channel. Explain.

For BSC, . We want C to be minimum so we have to maximize H(p). H(p) is maximum when p is equally probable (completely random) p = 0.5.

(b) Compute the capacity of a cascade of two of these channels.

For a non cascading BSC, 


the capacity is given by:


To find the capacity of a cascade of two of these channels, we reduce the cascade into a non-cascade equivalent.


We have


(c) Explain to what value the capacity of n identical BSC channels converges as n becomes large.

From (b) we can see that as we cascade more and more channels, we have more and more chances of errors. For BSC channels, there are two paths to 0 whereas for a one stage cascading of BSC channels, there are four paths to 0. More paths lead to more errors. You can also see this in the equations in (b).   The entropy will be greater as more cascading stages are added thus reduces the capacity. The capacity will go to 0 as n goes to infinity.

Problem No. 3: Differential Entropy

(a) A continuous random variable has a pdf:  where . Compute the entropy.


Using integration by parts, we have


(b) A continuous random variable has a pdf:  where . Compute the entropy and compare to (a). Explain any differences.

Let . Using technique in (a), we find the entropy for y(x).


Using integration by parts, we have


From scaling theorem, 

we can find the entropy of g(x) by scaling the entropy for y which gives 


We notice that g(x) increases entropy.

(c) A continuous random variable has a pdf:  where . Compute the entropy. Explain.


We use the translation theorem proved in class


Translation does not change the differential entropy. Thus, h(X) = h(X) of part (a) which is 1/ln2.

(d) Compute the relative entropy between f(x) (in part a) and g(x) (in part b): D(f || g).