Goto

Collaborating Authors

 Regression


Sublinear-Time Approximate MCMC Transitions for Probabilistic Programs

arXiv.org Machine Learning

Probabilistic programming languages can simplify the development of machine learning techniques, but only if inference is sufficiently scalable. Unfortunately, Bayesian parameter estimation for highly coupled models such as regressions and state-space models still scales poorly; each MCMC transition takes linear time in the number of observations. This paper describes a sublinear-time algorithm for making Metropolis-Hastings (MH) updates to latent variables in probabilistic programs. The approach generalizes recently introduced approximate MH techniques: instead of subsampling data items assumed to be independent, it subsamples edges in a dynamically constructed graphical model. It thus applies to a broader class of problems and interoperates with other general-purpose inference techniques. Empirical results, including confirmation of sublinear per-transition scaling, are presented for Bayesian logistic regression, nonlinear classification via joint Dirichlet process mixtures, and parameter estimation for stochastic volatility models (with state estimation via particle MCMC). All three applications use the same implementation, and each requires under 20 lines of probabilistic code.


Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

arXiv.org Machine Learning

Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data. We model vectors of categorical variables as generated from a non-linear transformation of a continuous latent space. Non-linearity captures multi-modality in the distribution. The continuous representation addresses sparsity. Our model ties together many existing models, linking the linear categorical latent Gaussian model, the Gaussian process latent variable model, and Gaussian process classification. We derive inference for our model based on recent developments in sampling based variational inference. We show empirically that the model outperforms its linear and discrete counterparts in imputation tasks of sparse data.


Using Matched Samples to Estimate the Effects of Exercise on Mental Health via Twitter

AAAI Conferences

Recent work has demonstrated the value of social media monitoring for health surveillance (e.g., tracking influenza or depression rates). It is an open question whether such data can be used to make causal inferences (e.g., determining which activities lead to increased depression rates). Even in traditional, restricted domains, estimating causal effects from observational data is highly susceptible to confounding bias. In this work, we estimate the effect of exercise on mental health from Twitter, relying on statistical matching methods to reduce confounding bias. We train a text classifier to estimate the volume of a user's tweets expressing anxiety, depression, or anger, then compare two groups: those who exercise regularly (identified by their use of physical activity trackers like Nike+), and a matched control group. We find that those who exercise regularly have significantly fewer tweets expressing depression or anxiety; there is no significant difference in rates of tweets expressing anger. We additionally perform a sensitivity analysis to investigate how the many experimental design choices in such a study impact the final conclusions, including the quality of the classifier and the construction of the control group.


Predicting the Demographics of Twitter Users from Website Traffic Data

AAAI Conferences

Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. In this paper, we predict the demographics of Twitter users based on whom they follow. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics using information about the followers of each website on Twitter. The resulting average held-out correlation is .77 across six different variables (gender, age, ethnicity, education, income, and child status). We additionally validate the model on a smaller set of Twitter users labeled individually for ethnicity and gender, finding performance that is surprisingly competitive with a fully supervised approach.


Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model

AAAI Conferences

Gaussian process (GP) regression is a widely used method for non-linear prediction.The performance of the GP regression depends on whether it can properly capture the covariance structure of target variables, which is represented by kernels between input data.However, when the input is represented as a set of features, e.g. bag-of-words, it is difficult to calculate desirable kernel values because the co-occurrence of different but relevant words cannot be reflected in the kernel calculation.To overcome this problem, we propose a Gaussian process latent variable set model (GP-LVSM), which is a non-linear regression model effective for bag-of-words data.With the GP-LVSM, a latent vector is associated with each word, and each document is represented as a distribution of the latent vectors for words appearing in the document. We efficiently represent the distributions by using the framework of kernel embeddings of distributions that can hold high-order moment information of distributions without need for explicit density estimation.By learning latent vectors so as to maximize the posterior probability, kernels that reflect relations between words are obtained, and also words are visualized in a low-dimensional space.In experiments using 25 item review datasets, we demonstrate the effectiveness of the GP-LVSM in prediction and visualization.


Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners

AAAI Conferences

We investigate a problem at the intersection of machine learning and security: training-set attacks on machine learners. In such attacks an attacker contaminates the training data so that a specific learning algorithm would produce a model profitable to the attacker. Understanding training-set attacks is important as more intelligent agents (e.g. spam filters and robots) are equipped with learning capability and can potentially be hacked via data they receive from the environment. This paper identifies the optimal training-set attack on a broad family of machine learners. First we show that optimal training-set attack can be formulated as a bilevel optimization problem. Then we show that for machine learners with certain Karush-Kuhn-Tucker conditions we can solve the bilevel problem efficiently using gradient methods on an implicit function. As examples, we demonstrate optimal training-set attacks on Support VectorMachines, logistic regression, and linear regression with extensive experiments. Finally, we discuss potential defenses against such attacks.


Predicting Peer-to-Peer Loan Rates Using Bayesian Non-Linear Regression

AAAI Conferences

Peer-to-peer lending is a new highly liquid market for debt, which is rapidly growing in popularity. Here we consider modelling market rates, developing a non-linear Gaussian Process regression method which incorporates both structured data and unstructured text from the loan application. We show that the peer-to-peer market is predictable, and identify a small set of key factors with high predictive power. Our approach outperforms baseline methods for predicting market rates, and generates substantial profit in a trading simulation.


A Closed Form Solution to Multi-View Low-Rank Regression

AAAI Conferences

Real life data often includes information from different channels. For example, in computer vision, we can describe an image using different image features, such as pixel intensity, color, HOG, GIST feature, SIFT features, etc.. These different aspects of the same objects are often called multi-view (or multi-modal) data. Low-rank regression model has been proved to be an effective learning mechanism by exploring the low-rank structure of real life data. But previous low-rank regression model only works on single view data. In this paper, we propose a multi-view low-rank regression model by imposing low-rank constraints on multi-view regression model. Most importantly, we provide a closed-form solution to the multi-view low-rank regression model. Extensive experiments on 4 multi-view datasets show that the multi-view low-rank regression model outperforms single-view regression model and reveals that multi-view low-rank structure is very helpful.


Identifying At-Risk Students in Massive Open Online Courses

AAAI Conferences

Massive Open Online Courses (MOOCs) have received widespread attention for their potential to scale higher education, with multiple platforms such as Coursera, edX and Udacity recently appearing. Despite their successes, a major problem faced by MOOCs is low completion rates. In this paper, we explore the accurate early identification of students who are at risk of not completing courses. We build predictive models weekly, over multiple offerings of a course. Furthermore, we envision student interventions that present meaningful probabilities of failure, enacted only for marginal students.To be effective, predicted probabilities must be both well-calibrated and smoothed across weeks.Based on logistic regression, we propose two transfer learning algorithms to trade-off smoothness and accuracy by adding a regularization term to minimize the difference of failure probabilities between consecutive weeks. Experimental results on two offerings of a Coursera MOOC establish the effectiveness of our algorithms.


Constructing Models of User and Task Characteristics from Eye Gaze Data for User-Adaptive Information Highlighting

AAAI Conferences

A user-adaptive information visualization system capable of learning models of users and the visualization tasks they perform could provide interventions optimized for helping specific users in specific task contexts. In this paper, we investigate the accuracy of predicting visualization tasks, user performance on tasks, and user traits from gaze data. We show that predictions made with a logistic regression model are significantly better than a baseline classifier, with particularly strong results for predicting task type and user performance. Furthermore, we compare classifiers built with interface-independent and interface-dependent features, and show that the interface-independent features are comparable or superior to interface-dependent ones. Finally, we discuss how the accuracy of predictive models is affected if they are trained with data from trials that had highlighting interventions added to the visualization.