Lafferty, John D.
Exponential Concentration for Mutual Information Estimation with Application to Forests
Liu, Han, Wasserman, Larry, Lafferty, John D.
We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information. Previous results on mutual information estimation only bounded expected error. The advantage of having the exponential inequality is that, combined with the union bound, we can guarantee accurate estimators of the mutual information for many pairs of random variables simultaneously. As an application, we show how to use such a result to optimally estimate the density function and graph of a distribution which is Markov to a forest graph.
Nonparametric Reduced Rank Regression
Foygel, Rina, Horrell, Michael, Drton, Mathias, Lafferty, John D.
We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a $q$-dimensional response, with a shared $p$-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data.
Graph-Valued Regression
Liu, Han, Chen, Xi, Wasserman, Larry, Lafferty, John D.
Undirected graphical models encode in a graph $G$ the dependency structure of a random vector $Y$. In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X=x$ as ``graph-valued regression''. In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method ``Graph-optimized CART'', or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data.
Nonparametric regression and classification with joint sparsity constraints
Liu, Han, Wasserman, Larry, Lafferty, John D.
We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data.
Visualizing Topics with Multi-Word Expressions
Blei, David M., Lafferty, John D.
We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant $n$-grams related to a topic, which are then used to help understand and interpret the underlying distribution. Compared with the usual visualization, which simply lists the most probable topical terms, the multi-word expressions provide a better intuitive impression for what a topic is "about." Our approach is based on a language model of arbitrary length expressions, for which we develop a new methodology based on nested permutation tests to find significant phrases. We show that this method outperforms the more standard use of $\chi^2$ and likelihood ratio tests. We illustrate the topic presentations on corpora of scientific abstracts and news articles.
SpAM: Sparse Additive Models
Liu, Han, Wasserman, Larry, Lafferty, John D., Ravikumar, Pradeep K.
We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data.
Compressed Regression
Zhou, Shuheng, Wasserman, Larry, Lafferty, John D.
Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original $n$ input variables are compressed by a random linear transformation to $m \ll n$ examples in $p$ dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for $\ell_1$-regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called ``sparsistence.'' In addition, we show that $\ell_1$-regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called ``persistence.'' Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero.
SpAM: Sparse Additive Models
Liu, Han, Wasserman, Larry, Lafferty, John D., Ravikumar, Pradeep K.
We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive amethod for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing thatSpAM can be effective in fitting sparse nonparametric models in high dimensional data.
Statistical Analysis of Semi-Supervised Regression
Wasserman, Larry, Lafferty, John D.
Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors.While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus,the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning.