AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Leveraging Decomposed Trust in Probabilistic Matrix Factorization for Effective Recommendation

Fang, Hui (Nanyang Technological University, Singapore) | Bao, Yang (Nanyang Technological University) | Zhang, Jie (Nanyang Technological University)

AAAI ConferencesJul-14-2014

Trust has been used to replace or complement rating-based similarity in recommender systems, to improve the accuracy of rating prediction. However, people trusting each other may not always share similar preferences. In this paper, we try to fill in this gap by decomposing the original single-aspect trust information into four general trust aspects, i.e. benevolence, integrity, competence, and predictability, and further employing the support vector regression technique to incorporate them into the probabilistic matrix factorization model for rating prediction in recommender systems. Experimental results on four datasets demonstrate the superiority of our method over the state-of-the-art approaches.

artificial intelligence, information, machine learning, (19 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Genre: Research Report (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Monte Carlo Simulation for Lasso-Type Problems by Estimator Augmentation

Zhou, Qing

arXiv.org Machine LearningJul-13-2014

Regularized linear regression under the $\ell_1$ penalty, such as the Lasso, has been shown to be effective in variable selection and sparse modeling. The sampling distribution of an $\ell_1$-penalized estimator $\hat{\beta}$ is hard to determine as the estimator is defined by an optimization problem that in general can only be solved numerically and many of its components may be exactly zero. Let $S$ be the subgradient of the $\ell_1$ norm of the coefficient vector $\beta$ evaluated at $\hat{\beta}$. We find that the joint sampling distribution of $\hat{\beta}$ and $S$, together called an augmented estimator, is much more tractable and has a closed-form density under a normal error distribution in both low-dimensional ($p\leq n$) and high-dimensional ($p>n$) settings. Given $\beta$ and the error variance $\sigma^2$, one may employ standard Monte Carlo methods, such as Markov chain Monte Carlo and importance sampling, to draw samples from the distribution of the augmented estimator and calculate expectations with respect to the sampling distribution of $\hat{\beta}$. We develop a few concrete Monte Carlo algorithms and demonstrate with numerical examples that our approach may offer huge advantages and great flexibility in studying sampling distributions in $\ell_1$-penalized linear regression. We also establish nonasymptotic bounds on the difference between the true sampling distribution of $\hat{\beta}$ and its estimator obtained by plugging in estimated parameters, which justifies the validity of Monte Carlo simulation from an estimated sampling distribution even when $p\gg n\to \infty$.

artificial intelligence, estimator, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1080/01621459.2014.946035

1401.4425

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

Add feedback

Structured Learning via Logistic Regression

Domke, Justin

arXiv.org Machine LearningJul-2-2014

A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is "smoothed" through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an "oracle" exists to minimize a logistic loss.

artificial intelligence, example prediction, machine learning, (18 more...)

arXiv.org Machine Learning

1407.0754

Genre: Research Report > New Finding (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.83)

Add feedback

Relational Logistic Regression

Kazemi, Seyed Mehran (University of British Columbia) | Buchman, David (University of British Columbia) | Kersting, Kristian (Technical University of Dortmund) | Natarajan, Sriraam (Indiana University) | Poole, David (University of British Columbia)

AAAI ConferencesJul-1-2014

Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this paper we consider extending logistic regression to relational models, where we want to model varying populations and interactions among parents. In this paper, we first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. Finally, we show how other well-known aggregators can be represented using this relational logistic regression.

relational logistic regression

AAAI Conferences

Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Add feedback

Fast and Robust Least Squares Estimation in Corrupted Linear Models

McWilliams, Brian, Krummenacher, Gabriel, Lucic, Mario, Buhmann, Joachim M.

arXiv.org Machine LearningJun-19-2014

Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. This property of influence -- for which we also develop a randomized approximation -- motivates our proposed subsampling algorithm for large scale corrupted linear regression which limits the influence of data points since highly influential points contribute most to the residual error. Under a general model of corrupted observations, we show theoretically and empirically on a variety of simulated and real datasets that our algorithm improves over the current state-of-the-art approximation schemes for ordinary least squares.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1406.3175

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

RAPID: Rapidly Accelerated Proximal Gradient Algorithms for Convex Minimization

Zhang, Ziming, Saligrama, Venkatesh

arXiv.org Machine LearningJun-18-2014

In this paper, we propose a new algorithm to speed-up the convergence of accelerated proximal gradient (APG) methods. In order to minimize a convex function $f(\mathbf{x})$, our algorithm introduces a simple line search step after each proximal gradient step in APG so that a biconvex function $f(\theta\mathbf{x})$ is minimized over scalar variable $\theta>0$ while fixing variable $\mathbf{x}$. We propose two new ways of constructing the auxiliary variables in APG based on the intermediate solutions of the proximal gradient and the line search steps. We prove that at arbitrary iteration step $t (t\geq1)$, our algorithm can achieve a smaller upper-bound for the gap between the current and optimal objective values than those in the traditional APG methods such as FISTA, making it converge faster in practice. In fact, our algorithm can be potentially applied to many important convex optimization problems, such as sparse linear regression and kernel SVMs. Our experimental results clearly demonstrate that our algorithm converges faster than APG in all of the applications above, even comparable to some sophisticated solvers.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1406.4445

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Optimality of Graphlet Screening in High Dimensional Variable Selection

Jin, Jiashun, Zhang, Cun-Hui, Zhang, Qi

arXiv.org Machine LearningJun-13-2014

Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of GS is to use GOSD to guide both the screening and cleaning. Compared to m-variate brute-forth screening that has a computational cost of p^m, the GS only has a computational cost of p (up to some multi-log(p) factors) in screening. We measure the performance of any variable selection procedure by the minimax Hamming distance. We show that in a very broad class of situations, GS achieves the optimal rate of convergence in terms of the Hamming distance. Somewhat surprisingly, the well-known procedures subset selection and the lasso are rate non-optimal, even in very simple settings and even when their tuning parameters are ideally set.

artificial intelligence, machine learning, selection, (14 more...)

arXiv.org Machine Learning

1204.6452

Country: North America > United States > Wisconsin (0.27)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Compressed Gaussian Process

Guhaniyogi, Rajarshi, Dunson, David B.

arXiv.org Machine LearningJun-7-2014

Nonparametric regression for massive numbers of samples (n) and features (p) is an increasingly important problem. In big n settings, a common strategy is to partition the feature space, and then separately apply simple models to each partition set. We propose an alternative approach, which avoids such partitioning and the associated sensitivity to neighborhood choice and distance metrics, by using random compression combined with Gaussian process regression. The proposed approach is particularly motivated by the setting in which the response is conditionally independent of the features given the projection to a low dimensional manifold. Conditionally on the random compression matrix and a smoothness parameter, the posterior distribution for the regression surface and posterior predictive distributions are available analytically. Running the analysis in parallel for many random compression matrices and smoothness parameters, model averaging is used to combine the results. The algorithm can be implemented rapidly even in very big n and p problems, has strong theoretical justification, and is found to yield state of the art predictive performance.

data mining, machine learning, predictive interval, (22 more...)

arXiv.org Machine Learning

1406.1916

Genre: Research Report (0.82)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(3 more...)

Add feedback

Functional Gaussian processes for regression with linear PDE models

Nguyen, Ngoc-Cuong, Peraire, Jaime

arXiv.org Machine LearningMay-29-2014

In this paper, we present a new statistical approach to the problem of incorporating experimental observations into a mathematical model described by linear partial differential equations (PDEs) to improve the prediction of the state of a physical system. We augment the linear PDE with a functional that accounts for the uncertainty in the mathematical model and is modeled as a {\em Gaussian process}. This gives rise to a stochastic PDE which is characterized by the Gaussian functional. We develop a {\em functional Gaussian process regression} method to determine the posterior mean and covariance of the Gaussian functional, thereby solving the stochastic PDE to obtain the posterior distribution for our prediction of the physical state. Our method has the following features which distinguish itself from other regression methods. First, it incorporates both the mathematical model and the observations into the regression procedure. Second, it can handle the observations given in the form of linear functionals of the field variable. Third, the method is non-parametric in the sense that it provides a systematic way to optimally determine the prior covariance operator of the Gaussian functional based on the observations. Fourth, it provides the posterior distribution quantifying the magnitude of uncertainty in our prediction of the physical state. We present numerical results to illustrate these features of the method and compare its performance to that of the standard Gaussian process regression.

artificial intelligence, machine learning, regression, (17 more...)

arXiv.org Machine Learning

1405.7569

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Add feedback

Futility Analysis in the Cross-Validation of Machine Learning Models

Kuhn, Max

arXiv.org Machine LearningMay-27-2014

Many machine learning models have important structural tuning parameters that cannot be directly estimated from the data. The common tactic for setting these parameters is to use resampling methods, such as cross--validation or the bootstrap, to evaluate a candidate set of values and choose the best based on some pre--defined criterion. Unfortunately, this process can be time consuming. However, the model tuning process can be streamlined by adaptively resampling candidate values so that settings that are clearly sub-optimal can be discarded. The notion of futility analysis is introduced in this context. An example is shown that illustrates how adaptive resampling can be used to reduce training time. Simulation studies are used to understand how the potential speed--up is affected by parallel processing techniques.

artificial intelligence, machine learning, resample, (17 more...)

arXiv.org Machine Learning

1405.6974

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback