Information Subtraction: Learning Representations for Conditional Entropy

Leong, Keng Hou, Xiu, Yuxuan, Kin, Wai, Chan, null

arXiv.org Artificial Intelligence 

We may consider the observations as samples from stochastic distributions and use informationtheoretic measures, as shown in Figure 1, to quantify the uncertainty and shared information among variables. These measures reveal the strength of relationships, including correlation and Granger causality between variables(Pearl 2009). Beyond merely recognizing the magnitude of such relationships, many representation learning works aim to further explain and describe them to enhance our understanding and control over the system (Yao et al. 2021; Xu et al. 2023). These approaches generate representations that maximize information about the targets, as they must be capable of accurately reconstructing the targets (Kingma and Welling 2013; Clark et al. 2019). Therefore, most methods are capable of effectively represent entropy H(Y) or mutual information I(X;Y), which describes the total information of Y and the shared information between X and Y, respectively, as shown in Figure 1. However, fewer methods have addressed the representation of other information terms such as conditional entropy H(Y |X) and conditional mutual information I(X;Y |W), which describes the information in Y not provided by X, and the information that X provides to Y but W does not, respectively. The representation of conditional mutual information is significant as it reveals the distinct impact of a specific factor on the target, which other factors do not provide. For example, identifying the distinct effect of funding on a scholar's publications, seperate from other factors, can guide policy decisions such as terminating funding that shows insignificant boosting. Furthermore, representing conditional entropy helps in creating fair and unbiased representations by removing the impact of sensitive factors.