covariance graph
Sparse Quadratic Discriminant Analysis and Community Bayes
We develop a class of rules spanning the range between quadratic discriminant analysis and naive Bayes, through a path of sparse graphical models. A group lasso penalty is used to introduce shrinkage and encourage a similar pattern of sparsity across precision matrices. It gives sparse estimates of interactions and produces interpretable models. Inspired by the connected-components structure of the estimated precision matrices, we propose the community Bayes model, which partitions features into several conditional independent communities and splits the classification problem into separate smaller ones. The community Bayes idea is quite general and can be applied to non-Gaussian data and likelihood-based classifiers.
Reading Dependencies from Covariance Graphs
The covariance graph (aka bi-directed graph) of a probability distribution $p$ is the undirected graph $G$ where two nodes are adjacent iff their corresponding random variables are marginally dependent in $p$. In this paper, we present a graphical criterion for reading dependencies from $G$, under the assumption that $p$ satisfies the graphoid properties as well as weak transitivity and composition. We prove that the graphical criterion is sound and complete in certain sense. We argue that our assumptions are not too restrictive. For instance, all the regular Gaussian probability distributions satisfy them.
Exact covariance thresholding into connected components for large-scale Graphical Lasso
Mazumder, Rahul, Hastie, Trevor
We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter $\rho$. Suppose the co- variance graph formed by thresholding the entries of the sample covariance matrix at $\rho$ is decomposed into connected components. We show that the vertex-partition induced by the thresholded covariance graph is exactly equal to that induced by the estimated concentration graph. This simple rule, when used as a wrapper around existing algorithms, leads to enormous performance gains. For large values of $\rho$, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large scale graphical lasso problem.