The definition of conditional relative entropy is: Note that you might consider this the average of the relative entropy between and . Since it is an average, we must weight the measure by the probability distribution of x (hence, infrequently occuring events will drive the log negative, but also scale a contribution towards zero). The measure will be dominated by values of x that occur frequently and for which . In practice, when computing any relative entropy, one must be extremely concerned about pdf's that achieve one or more zero values. This commonly occurs when dealing with small data sets. There are a variety of interpolation techniques designed to deal with this situation. The relative entropy between two joint distributions is defined as: This is just the expectation of the log, as we saw before for the one variable case. Noting that , we can show that the relative entropy between two joint distributions can be expressed as: Hence, the distance between the joint distributions is larger than the distance between the marginals. Only when is independent of are they equal.