The definition of conditional relative entropy is:

Note that you might consider this the average of the relative entropy between  and . Since it is an average, we must weight the measure by the probability distribution of x (hence, infrequently occuring events will drive the log negative, but also scale a contribution towards zero). The measure will be dominated by values of x that occur frequently and for which .
In practice, when computing any relative entropy, one must be extremely concerned about pdf's that achieve one or more zero values. This commonly occurs when dealing with small data sets. There are a variety of interpolation techniques designed to deal with this situation.
The relative entropy between two joint distributions is defined as:

This is just the expectation of the log, as we saw before for the one variable case. Noting that , we can show that the relative entropy between two joint distributions can be expressed as:

Hence, the distance between the joint distributions is larger than the distance between the marginals. Only when  is independent of  are they equal.