Globally optimal score-based learning of directed acyclic graphs in high-dimensions

Neural Information Processing Systems 

We prove that \Omega(s\log p) samples suffice to learn a sparse Gaussian directed acyclic graph (DAG) from data, where s is the maximum Markov blanket size. This improves upon recent results that require \Omega(s {4}\log p) samples in the equal variance case. To prove this, we analyze a popular score-based estimator that has been the subject of extensive empirical inquiry in recent years and is known to achieve state-of-the-art results. Furthermore, the approach we study does not require strong assumptions such as faithfulness that existing theory for score-based learning crucially relies on. The resulting estimator is based around a difficult nonconvex optimization problem, and its analysis may be of independent interest given recent interest in nonconvex optimization in machine learning.