Goto

Collaborating Authors

 Regression


Leveraging Decomposed Trust in Probabilistic Matrix Factorization for Effective Recommendation

AAAI Conferences

Trust has been used to replace or complement rating-based similarity in recommender systems, to improve the accuracy of rating prediction. However, people trusting each other may not always share similar preferences. In this paper, we try to fill in this gap by decomposing the original single-aspect trust information into four general trust aspects, i.e. benevolence, integrity, competence, and predictability, and further employing the support vector regression technique to incorporate them into the probabilistic matrix factorization model for rating prediction in recommender systems. Experimental results on four datasets demonstrate the superiority of our method over the state-of-the-art approaches.


Monte Carlo Simulation for Lasso-Type Problems by Estimator Augmentation

arXiv.org Machine Learning

Regularized linear regression under the $\ell_1$ penalty, such as the Lasso, has been shown to be effective in variable selection and sparse modeling. The sampling distribution of an $\ell_1$-penalized estimator $\hat{\beta}$ is hard to determine as the estimator is defined by an optimization problem that in general can only be solved numerically and many of its components may be exactly zero. Let $S$ be the subgradient of the $\ell_1$ norm of the coefficient vector $\beta$ evaluated at $\hat{\beta}$. We find that the joint sampling distribution of $\hat{\beta}$ and $S$, together called an augmented estimator, is much more tractable and has a closed-form density under a normal error distribution in both low-dimensional ($p\leq n$) and high-dimensional ($p>n$) settings. Given $\beta$ and the error variance $\sigma^2$, one may employ standard Monte Carlo methods, such as Markov chain Monte Carlo and importance sampling, to draw samples from the distribution of the augmented estimator and calculate expectations with respect to the sampling distribution of $\hat{\beta}$. We develop a few concrete Monte Carlo algorithms and demonstrate with numerical examples that our approach may offer huge advantages and great flexibility in studying sampling distributions in $\ell_1$-penalized linear regression. We also establish nonasymptotic bounds on the difference between the true sampling distribution of $\hat{\beta}$ and its estimator obtained by plugging in estimated parameters, which justifies the validity of Monte Carlo simulation from an estimated sampling distribution even when $p\gg n\to \infty$.


Structured Learning via Logistic Regression

arXiv.org Machine Learning

A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is "smoothed" through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an "oracle" exists to minimize a logistic loss.


Relational Logistic Regression

AAAI Conferences

Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this paper we consider extending logistic regression to relational models, where we want to model varying populations and interactions among parents. In this paper, we first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. Finally, we show how other well-known aggregators can be represented using this relational logistic regression.


Fast and Robust Least Squares Estimation in Corrupted Linear Models

arXiv.org Machine Learning

Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. This property of influence -- for which we also develop a randomized approximation -- motivates our proposed subsampling algorithm for large scale corrupted linear regression which limits the influence of data points since highly influential points contribute most to the residual error. Under a general model of corrupted observations, we show theoretically and empirically on a variety of simulated and real datasets that our algorithm improves over the current state-of-the-art approximation schemes for ordinary least squares.


RAPID: Rapidly Accelerated Proximal Gradient Algorithms for Convex Minimization

arXiv.org Machine Learning

In this paper, we propose a new algorithm to speed-up the convergence of accelerated proximal gradient (APG) methods. In order to minimize a convex function $f(\mathbf{x})$, our algorithm introduces a simple line search step after each proximal gradient step in APG so that a biconvex function $f(\theta\mathbf{x})$ is minimized over scalar variable $\theta>0$ while fixing variable $\mathbf{x}$. We propose two new ways of constructing the auxiliary variables in APG based on the intermediate solutions of the proximal gradient and the line search steps. We prove that at arbitrary iteration step $t (t\geq1)$, our algorithm can achieve a smaller upper-bound for the gap between the current and optimal objective values than those in the traditional APG methods such as FISTA, making it converge faster in practice. In fact, our algorithm can be potentially applied to many important convex optimization problems, such as sparse linear regression and kernel SVMs. Our experimental results clearly demonstrate that our algorithm converges faster than APG in all of the applications above, even comparable to some sophisticated solvers.


Optimality of Graphlet Screening in High Dimensional Variable Selection

arXiv.org Machine Learning

Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of GS is to use GOSD to guide both the screening and cleaning. Compared to m-variate brute-forth screening that has a computational cost of p^m, the GS only has a computational cost of p (up to some multi-log(p) factors) in screening. We measure the performance of any variable selection procedure by the minimax Hamming distance. We show that in a very broad class of situations, GS achieves the optimal rate of convergence in terms of the Hamming distance. Somewhat surprisingly, the well-known procedures subset selection and the lasso are rate non-optimal, even in very simple settings and even when their tuning parameters are ideally set.


Compressed Gaussian Process

arXiv.org Machine Learning

Nonparametric regression for massive numbers of samples (n) and features (p) is an increasingly important problem. In big n settings, a common strategy is to partition the feature space, and then separately apply simple models to each partition set. We propose an alternative approach, which avoids such partitioning and the associated sensitivity to neighborhood choice and distance metrics, by using random compression combined with Gaussian process regression. The proposed approach is particularly motivated by the setting in which the response is conditionally independent of the features given the projection to a low dimensional manifold. Conditionally on the random compression matrix and a smoothness parameter, the posterior distribution for the regression surface and posterior predictive distributions are available analytically. Running the analysis in parallel for many random compression matrices and smoothness parameters, model averaging is used to combine the results. The algorithm can be implemented rapidly even in very big n and p problems, has strong theoretical justification, and is found to yield state of the art predictive performance.


Functional Gaussian processes for regression with linear PDE models

arXiv.org Machine Learning

In this paper, we present a new statistical approach to the problem of incorporating experimental observations into a mathematical model described by linear partial differential equations (PDEs) to improve the prediction of the state of a physical system. We augment the linear PDE with a functional that accounts for the uncertainty in the mathematical model and is modeled as a {\em Gaussian process}. This gives rise to a stochastic PDE which is characterized by the Gaussian functional. We develop a {\em functional Gaussian process regression} method to determine the posterior mean and covariance of the Gaussian functional, thereby solving the stochastic PDE to obtain the posterior distribution for our prediction of the physical state. Our method has the following features which distinguish itself from other regression methods. First, it incorporates both the mathematical model and the observations into the regression procedure. Second, it can handle the observations given in the form of linear functionals of the field variable. Third, the method is non-parametric in the sense that it provides a systematic way to optimally determine the prior covariance operator of the Gaussian functional based on the observations. Fourth, it provides the posterior distribution quantifying the magnitude of uncertainty in our prediction of the physical state. We present numerical results to illustrate these features of the method and compare its performance to that of the standard Gaussian process regression.


Futility Analysis in the Cross-Validation of Machine Learning Models

arXiv.org Machine Learning

Many machine learning models have important structural tuning parameters that cannot be directly estimated from the data. The common tactic for setting these parameters is to use resampling methods, such as cross--validation or the bootstrap, to evaluate a candidate set of values and choose the best based on some pre--defined criterion. Unfortunately, this process can be time consuming. However, the model tuning process can be streamlined by adaptively resampling candidate values so that settings that are clearly sub-optimal can be discarded. The notion of futility analysis is introduced in this context. An example is shown that illustrates how adaptive resampling can be used to reduce training time. Simulation studies are used to understand how the potential speed--up is affected by parallel processing techniques.