An Algorithm to Learn Polytree Networks with Hidden Nodes

Sepehr, Firoozeh, Materassi, Donatello

Neural Information Processing Systems 

Ancestral graphs are a prevalent mathematical tool to take into account latent (hidden) variables in a probabilistic graphical model. In ancestral graph representations, the nodes are only the observed (manifest) variables and the notion of m-separation fully characterizes the conditional independence relations among such variables, bypassing the need to explicitly consider latent variables. However, ancestral graph models do not necessarily represent the actual causal structure of the model, and do not contain information about, for example, the precise number and location of the hidden variables. Being able to detect the presence of latent variables while also inferring their precise location within the actual causal structure model is a more challenging task that provides more information about the actual causal relationships among all the model variables, including the latent ones. In this article, we develop an algorithm to exactly recover graphical models of random variables with underlying polytree structures when the latent nodes satisfy specific degree conditions.