Regression
Extending the Modelling Capacity of Gaussian Conditional Random Fields while Learning Faster
Glass, Jesse (Temple University) | Ghalwash, Mohamed (Temple University) | Vukicevic, Milan (University of Belgrade) | Obradovic, Zoran (Temple University)
Gaussian Conditional Random Fields (GCRF) are atype of structured regression model that incorporatesmultiple predictors and multiple graphs. This isachieved by defining quadratic term feature functions inGaussian canonical form which makes the conditionallog-likelihood function convex and hence allows findingthe optimal parameters by learning from data. In thiswork, the parameter space for the GCRF model is extendedto facilitate joint modelling of positive and negativeinfluences. This is achieved by restricting the modelto a single graph and formulating linear bounds on convexitywith respect to the models parameters. In addition,our formulation for the model using one networkallows calculating gradients much faster than alternativeimplementations. Lastly, we extend the model onestep farther and incorporate a bias term into our linkweight. This bias is solved as part of the convex optimization.Benefits of the proposed model in terms ofimproved accuracy and speed are characterized on severalsynthetic graphs with 2 million links as well as on ahospital admissions prediction task represented as a humandisease-symptom similarity network correspondingto more than 35 million hospitalization records inCalifornia over 9 years.
Fast Lasso Algorithm via Selective Coordinate Descent
Fujiwara, Yasuhiro (NTT) | Ida, Yasutoshi (NTT) | Shiokawa, Hiroaki (University of Tsukuba) | Iwamura, Sotetsu (NTT)
For the AI community, the lasso proposed by Tibshirani is an important regression approach in finding explanatory predictors in high dimensional data. The coordinate descent algorithm is a standard approach to solve the lasso which iteratively updates weights of predictors in a round-robin style until convergence. However, it has high computation cost. This paper proposes Sling, a fast approach to the lasso. It achieves high efficiency by skipping unnecessary updates for the predictors whose weight is zero in the iterations. Sling can obtain high prediction accuracy with fewer predictors than the standard approach. Experiments show that Sling can enhance the efficiency and the effectiveness of the lasso.
On the Effectiveness of Linear Models for One-Class Collaborative Filtering
Sedhain, Suvash (Australian National University) | Menon, Aditya Krishna (Australian National University and NICTA) | Sanner, Scott (Oregon State University and Australian National University) | Braziunas, Darius (Rakuten Kobo Inc)
In many personalised recommendation problems, there are examples of items users prefer or like, but no examples of items they dislike. A state-of-the-art method for such implicit feedback, or one-class collaborative filtering (OC-CF), problems is SLIM, which makes recommendations based on a learned item-item similarity matrix. While SLIM has been shown to perform well on implicit feedback tasks, we argue that it is hindered by two limitations: first, it does not produce user-personalised predictions, which hampers recommendation performance; second, it involves solving a constrained optimisation problem, which impedes fast training. In this paper, we propose LRec, a variant of SLIM that overcomes these limitations without sacrificing any of SLIM's strengths.At its core, LRec employs linear logistic regression; despite this simplicity, LRec consistently and significantly outperforms all existing methods on a range of datasets. Our results thus illustrate that the OC-CF problem can be effectively tackled via linear classification models.
Robust Text Classification in the Presence of Confounding Bias
Landeiro, Virgile (Illinois Institute of Technology) | Culotta, Aron (Illinois Institute of Technology)
As text classifiers become increasingly used in real-time applications, it is critical to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is a confounding variable Z that influences both the text features X and the class variable Y. For example, a classifier trained to predict the health status of a user based on their online communications may be confounded by socioeconomic variables. When the influence of Z changes from training to testing data, we find that classifier accuracy can degrade rapidly. Our approach, based on Pearl's back-door adjustment, estimates the underlying effect of a text variable on the class variable while controlling for the confounding variable. Although our goal is prediction, not causal inference, we find that such adjustments are essential to building text classifiers that are robust to confounding variables. On three diverse text classifications tasks, we find that covariate adjustment results in higher accuracy than competing baselines over a range of confounding relationships (e.g., in one setting, accuracy improves from 60% to 81%).
College Towns, Vacation Spots, and Tech Hubs: Using Geo-Social Media to Model and Compare Locations
Ge, Hancheng (Texas A&M University) | Caverlee, James (Texas A&M University)
In this paper, we explore the potential of geo-social media to construct location-based interest profiles to uncover the hidden relationships among disparate locations. Through an investigation of millions of geo-tagged Tweets, we construct a per-city interest model based on fourteen high-level categories (e.g., technology, art, sports). These interest models support the discovery of related locations that are connected based on these categorical perspectives (e.g., college towns or vacation spots) but perhaps not on the individual tweet level. We then connect these city-based interest models to underlying demographic data. By building multivariate multiple linear regression (MMLR) and neural network (NN) models we show how a location's interest profile may be estimated based purely on its demographics features.
Loss minimization and parameter estimation with heavy tails
This work studies applications and generalizations of a simple estimation technique that provides exponential concentration under heavy-tailed distributions, assuming only bounded low-order moments. We show that the technique can be used for approximate minimization of smooth and strongly convex losses, and specifically for least squares linear regression. For instance, our $d$-dimensional estimator requires just $\tilde{O}(d\log(1/\delta))$ random samples to obtain a constant factor approximation to the optimal least squares loss with probability $1-\delta$, without requiring the covariates or noise to be bounded or subgaussian. We provide further applications to sparse linear regression and low-rank covariance matrix estimation with similar allowances on the noise and covariate distributions. The core technique is a generalization of the median-of-means estimator to arbitrary metric spaces.
Overview of predictive modelling, machine learning, etc.
In these situations, it is not always necessary to think about samples and populations, or to think about a model that expresses a scientific idea. It doesn't make sense to me, because if I were to build a regression model I would still need to think about my samples and population. I don't understand why I should just plug my sample data into R and hope for the best without any idea my sample is about. The sentence doesn't add anything, it's confusing and technically incorrect. Instead the goal is to simply find an equation or algorithm that makes reasonably correct predictons sounds doggy to me.