AITopics

1906.05819

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Sondhi, Arjun, Arbour, David, Dimmery, Drew

Balanced Off-Policy Evaluation in General Action Spaces

arXiv.org Machine LearningJun-13-2019

In many practical applications of contextual bandits, online learning is infeasible and practitioners must rely on off-policy evaluation (OPE) of logged data collected from prior policies. OPE generally consists of a combination of two components: (i) directly estimating a model of the reward given state and action and (ii) importance sampling. While recent work has made significant advances adaptively combining these two components, less attention has been paid to improving the quality of the importance weights themselves. In this work we present balancing off-policy evaluation (BOP-e), an importance sampling procedure that directly optimizes for balance and can be plugged into any OPE estimator that uses importance sampling. BOP-e directly estimates the importance sampling ratio via a classifier which attempts to distinguish state-action pairs from an observed versus a proposed policy. BOP-e can be applied to continuous, mixed, and multi-valued action spaces without modification and is easily scalable to many observations. Further, we show that minimization of regret in the constructed binary classification problem translates directly into minimizing regret in the off-policy evaluation task. Finally, we provide experimental evidence that BOP-e outperforms inverse propensity weighting-based approaches for offline evaluation of policies in the contextual bandit setting under both discrete and continuous action spaces.

artificial intelligence, estimator, machine learning, (16 more...)

1906.03694

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Narasimhan, Harikrishna, Cotter, Andrew, Gupta, Maya, Wang, Serena

Pairwise Fairness for Ranking and Regression

We present pairwise metrics of fairness for ranking and regression models that form analogues of statistical fairness notions such as equal opportunity or equal accuracy, as well as statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using constrained optimization and robust optimization techniques based on two player game algorithms developed for fair classification. Experiments illustrate the broad applicability and trade-offs of these methods.

artificial intelligence, constraint, machine learning, (14 more...)

1906.0533

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Education (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Salehi, Fariborz, Abbasi, Ehsan, Hassibi, Babak

The Impact of Regularization on High-dimensional Logistic Regression

Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist. We provide a precise analysis of the performance of RLR via the solution of a system of six nonlinear equations, through which any performance metric of interest (mean, mean-squared error, probability of support recovery, etc.) can be explicitly computed. Our results generalize those of Sur and Candes and we provide a detailed study for the cases of $\ell_2^2$-RLR and sparse ($\ell_1$-regularized) logistic regression. In both cases, we obtain explicit expressions for various performance metrics and can find the values of the regularizer parameter that optimizes the desired performance. The theory is validated by extensive numerical simulations across a range of parameter values and problem instances.

artificial intelligence, machine learning, optimization, (16 more...)

1906.03761

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas > Schleicher County (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.75)

Schneuwly, Arno, Grubenmann, Ralf, Cieliebak, Mark, Jaggi, Martin

Correlating Twitter Language with Community-Level Health Outcomes

We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.

artificial intelligence, machine learning, natural language, (16 more...)

1906.06465

Country:

North America > United States (0.47)
North America > Canada (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.39)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Robberechts, Pieter, Van Haaren, Jan, Davis, Jesse

Who Will Win It? An In-game Win Probability Model for Football

In-game win probability is a statistical metric that provides a sports team's likelihood of winning at any given point in a game, based on the performance of historical teams in the same situation. In-game win-probability models have been extensively studied in baseball, basketball and American football. These models serve as a tool to enhance the fan experience, evaluate in game-decision making and measure the risk-reward balance for coaching decisions. In contrast, they have received less attention in association football, because its low-scoring nature makes it far more challenging to analyze. In this paper, we build an in-game win probability model for football. Specifically, we first show that porting existing approaches, both in terms of the predictive models employed and the features considered, does not yield good in-game win-probability estimates for football. Second, we introduce our own Bayesian statistical model that utilizes a set of eight variables to predict the running win, tie and loss probabilities for the home team. We train our model using event data from the last four seasons of the major European football competitions. Our results indicate that our model provides well-calibrated probabilities. Finally, we elaborate on two use cases for our win probability metric: enhancing the fan experience and evaluating performance in crucial situations.

football, probability, win probability, (13 more...)

1906.05029

Country:

Asia > Japan (0.05)
South America > Argentina (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(4 more...)

Genre: Research Report > New Finding (0.35)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Hockey (1.00)
Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Karnin, Zohar, Liberty, Edo

Discrepancy, Coresets, and Sketches in Machine Learning

arXiv.org Machine LearningJun-11-2019

This paper defines the notion of class discrepancy for families of functions. It shows that low discrepancy classes admit small offline and streaming coresets. We provide general techniques for bounding the class discrepancy of machine learning problems. As corollaries of the general technique we bound the discrepancy (and therefore coreset complexity) of logistic regression, sigmoid activation loss, matrix covariance, kernel density and any analytic function of the dot product or the squared distance. Our results prove the existence of epsilon-approximation O(sqrt{d}/epsilon) sized coresets for the above problems. This resolves the long-standing open problem regarding the coreset complexity of Gaussian kernel density estimation. We provide two more related but independent results. First, an exponential improvement of the widely used merge-and-reduce trick which gives improved streaming sketches for any low discrepancy problem. Second, an extremely simple deterministic algorithm for finding low discrepancy sequences (and therefore coresets) for any positive semi-definite kernel. This paper establishes some explicit connections between class discrepancy, coreset complexity, learnability, and streaming algorithms.

artificial intelligence, coreset, machine learning, (14 more...)

1906.04845

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Maalouf, Alaa, Jubran, Ibrahim, Feldman, Dan

Fast and Accurate Least-Mean-Squares Solvers

arXiv.org Machine LearningJun-11-2019

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. We suggest an algorithm that gets a finite set of $n$ $d$-dimensional real vectors and returns a weighted subset of $d+1$ vectors whose sum is \emph{exactly} the same. The proof in Caratheodory's Theorem (1907) computes such a subset in $O(n^2d^2)$ time and thus not used in practice. Our algorithm computes this subset in $O(nd)$ time, using $O(\log n)$ calls to Caratheodory's construction on small but "smart" subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets. As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.

artificial intelligence, machine learning, matrix, (18 more...)

1906.04705

Country: North America (0.46)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Reeve, Henry W. J., Kaban, Ata

Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise

arXiv.org Machine LearningJun-11-2019

We consider classification in the presence of class-dependent asymmetric label noise with unknown noise probabilities. In this setting, identifiability conditions are known, but additional assumptions were shown to be required for finite sample rates, and so far only the parametric rate has been obtained. Assuming these identifiability conditions, together with a measure-smoothness condition on the regression function and Tsybakov's margin condition, we show that the Robust kNN classifier of Gao et al. attains, the minimax optimal rates of the noise-free setting, up to a log factor, even when trained on data with unknown asymmetric label noise. Hence, our results provide a solid theoretical backing for this empirically successful algorithm. By contrast the standard kNN is not even consistent in the setting of asymmetric label noise. A key idea in our analysis is a simple kNN based method for estimating the maximum of a function that requires far less assumptions than existing mode estimators do, and which may be of independent interest for noise proportion estimation and randomised optimisation problems.

artificial intelligence, data mining, machine learning, (15 more...)

1906.04542

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Liu, Shikang, Vahedian, Fatemeh, Hachen, David, Lizardo, Omar, Poellabauer, Christian, Striegel, Aaron, Milenkovic, Tijana

Heterogeneous network approach to predict individuals' mental health

arXiv.org Machine LearningJun-10-2019

Depression and anxiety are critical public health issues affecting millions of people around the world. To identify individuals who are vulnerable to depression and anxiety, predictive models have been built that typically utilize data from one source. Unlike these traditional models, in this study, we leverage a rich heterogeneous data set from the University of Notre Dame's NetHealth study that collected individuals' (student participants') social interaction data via smartphones, health-related behavioral data via wearables (Fitbit), and trait data from surveys. To integrate the different types of information, we model the NetHealth data as a heterogeneous information network (HIN). Then, we redefine the problem of predicting individuals' mental health conditions (depression or anxiety) in a novel manner, as applying to our HIN a popular paradigm of a recommender system (RS), which is typically used to predict the preference that a person would give to an item (e.g., a movie or book). In our case, the items are the individuals' different mental health states. We evaluate three state-of-the-art RS approaches. Also, we model the prediction of individuals' mental health as another problem type -- that of node classification (NC) in our HIN, evaluating in the process four node features under logistic regression as a proof-of-concept classifier. We find that our RS and NC network methods produce more accurate predictions than a logistic regression model using the same NetHealth data in the traditional non-network fashion as well as a random-approach. Also, we find that RS outperforms NC. This is the first study to integrate smartphone, wearable sensor, and survey data in an HIN manner and use RS or NC on the HIN to predict individuals' mental health conditions.

data mining, edge type, machine learning, (17 more...)

1906.04346

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)