AITopics

1905.05679

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(9 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Foster, Dylan J., Risteski, Andrej

Sum-of-squares meets square loss: Fast rates for agnostic tensor completion

arXiv.org Machine LearningMay-30-2019

We study tensor completion in the agnostic setting. In the classical tensor completion problem, we receive $n$ entries of an unknown rank-$r$ tensor and wish to exactly complete the remaining entries. In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor. For agnostic learning of third-order tensors with the square loss, we give the first polynomial time algorithm that obtains a "fast" (i.e., $O(1/n)$-type) rate improving over the rate obtained by reduction to matrix completion. Our prediction error rate to compete with the best $d\times{}d\times{}d$ tensor of rank-$r$ is $\tilde{O}(r^{2}d^{3/2}/n)$. We also obtain an exact oracle inequality that trades off estimation and approximation error. Our algorithm is based on the degree-six sum-of-squares relaxation of the tensor nuclear norm. The key feature of our analysis is to show that a certain characterization for the subgradient of the tensor nuclear norm can be encoded in the sum-of-squares proof system. This unlocks the standard toolbox for localization of empirical processes under the square loss, and allows us to establish restricted eigenvalue-type guarantees for various tensor regression models, with tensor completion as a special case. The new analysis of the relaxation complements Barak and Moitra (2016), who gave slow rates for agnostic tensor completion, and Potechin and Steurer (2017), who gave exact recovery guarantees for the noiseless setting. Our techniques are user-friendly, and we anticipate that they will find use elsewhere.

artificial intelligence, completion, machine learning, (16 more...)

1905.13283

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Agarwal, Alekh, Dudík, Miroslav, Wu, Zhiwei Steven

Fair Regression: Quantitative Definitions and Reduction-based Algorithms

arXiv.org Machine LearningMay-29-2019

In this paper, we study the prediction of a real-valued target, such as a risk score or recidivism rate, while guaranteeing a quantitative notion of fairness with respect to a protected attribute such as gender or race. We call this class of problems \emph{fair regression}. We propose general schemes for fair regression under two notions of fairness: (1) statistical parity, which asks that the prediction be statistically independent of the protected attribute, and (2) bounded group loss, which asks that the prediction error restricted to any protected group remain below some pre-determined level. While we only study these two notions of fairness, our schemes are applicable to arbitrary Lipschitz-continuous losses, and so they encompass least-squares regression, logistic regression, quantile regression, and many other tasks. Our schemes only require access to standard risk minimization algorithms (such as standard classification or least-squares regression) while providing theoretical guarantees on the optimality and fairness of the obtained solutions. In addition to analyzing theoretical properties of our schemes, we empirically demonstrate their ability to uncover fairness--accuracy frontiers on several standard datasets.

data mining, machine learning, regression, (16 more...)

1905.12843

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry:

Law (1.00)
Education (0.95)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Mihaita, Adriana-Simona, Liu, Zheyuan, Cai, Chen, Rizoiu, Marian-Andrei

Arterial incident duration prediction using a bi-level framework of extreme gradient-tree boosting

arXiv.org Machine LearningMay-29-2019

Abstract: Predicting traffic incident duration is a major challenge for many traffic centres around the world. Most research studies focus on predicting the incident duration on motorways rather than arterial roads, due to a high network complexity and lack of data. In this paper we propose a bi-level framework for predicting the accident duration on arterial road networks in Sydney, based on operational requirements of incident clearance target which is less than 45 minutes. Using incident baseline information, we first deploy a classification method using various ensemble tree models in order to predict whether a new incident will be cleared in less than 45min or not. If the incident was classified as short-term, then various regression models are developed for predicting the actual incident duration in minutes by incorporating various traffic flow features. After outlier removal and intensive model hyper-parameter tuning through randomized search and cross-validation, we show that the extreme gradient boost approach outperformed all models, including the gradient-boosted decision-trees by almost 53%. Finally, we perform a feature importance evaluation for incident duration prediction and show that the best prediction results are obtained when leveraging the real-time traffic flow in vicinity road sections to the reported accident location. Initial methods used to predict the incident duration were 1. Introduction Bayesian classifiers [5], discrete choice models (DCM) [6], probabilistic distribution analyses [7], and the hazard-based Traffic congestion is a major concern for many cities duration models (HBDM) [8].

artificial intelligence, machine learning, prediction, (18 more...)

1905.12254

Country:

Oceania > Australia > New South Wales (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Kwon, Jeongyeol, Caramanis, Constantine

EM Converges for a Mixture of Many Linear Regressions

We study the convergence of the Expectation-Maximization (EM) algorithm for mixtures of linear regressions with an arbitrary number $k$ of components. We show that as long as signal-to-noise ratio (SNR) is more than $\tilde{O}(k^2)$, well-initialized EM converges to the true regression parameters. Previous results for $k \geq 3$ have only established local convergence for the noiseless setting, i.e., where SNR is infinitely large. Our results establish a near optimal statistical error rate of $\tilde{O}(\sigma \sqrt{k^2 d/n})$ for (sample-splitting) finite-sample EM with $k$ components, where $d$ is dimension, $n$ is the number of samples, and $\sigma$ is the variance of noise. In particular, our results imply exact recovery as $\sigma \rightarrow 0$, in contrast to most previous local convergence results for EM, where the statistical error scaled with the norm of parameters. Standard moment-method approaches suffice to guarantee we are in the region where our local convergence guarantees apply.

artificial intelligence, exp, machine learning, (17 more...)

1905.12106

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Karimi, Amir-Hossein, Barthe, Gilles, Balle, Borja, Valera, Isabel

Model-Agnostic Counterfactual Explanations for Consequential Decisions

arXiv.org Artificial IntelligenceMay-28-2019

Predictive models are being increasingly used to support consequential decision making at the individual level in contexts such as pretrial bail and loan approval. As a result, there is increasing social and legal pressure to provide explanations that help the affected individuals not only to understand why a prediction was output, but also how to act to obtain a desired outcome. To this end, several works have proposed methods to generate counterfactual explanations. However, they are often restricted to a particular subset of models (e.g., decision trees or linear models), and cannot directly handle the mixed (numerical and nominal) nature of the features describing each individual. In this paper, we propose a model-agnostic algorithm to generate counterfactual explanations that builds on the standard theory and tools from formal verification. Specifically, our algorithm solves a sequence of satisfiability problems, where a wide variety of predictive models and distances in mixed feature spaces, as well as natural notions of plausibility and diversity, are represented as logic formulas. Our experiments on real-world data demonstrate that our approach can flexibly handle widely deployed predictive models, while providing meaningfully closer counterfactuals than existing approaches.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1905.1119

Country:

North America > United States (0.14)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Schürch, Manuel, Azzimonti, Dario, Benavoli, Alessio, Zaffalon, Marco

Recursive Estimation for Sparse Gaussian Process Regression

Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their plain usage is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. The majority of previous work has focused on the batch setting, whereas in this work we focusing on the training with mini-batches. In particular, we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman Filter and Information Filter like updating for online learning. Moreover, exploiting ideas from distributed estimation, we show how our approach can be distributed. For unknown parameters, we propose a novel approach that relies on recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, we have analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.

artificial intelligence, likelihood, machine learning, (20 more...)

1905.11711

Country:

Europe > Switzerland (0.04)
North America > United States (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
Europe > Ireland > Munster > County Limerick > Limerick (0.04)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Ghojogh, Benyamin, Crowley, Mark

The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial

In this tutorial paper, we first define mean squared error, variance, covariance, and bias of both random variables and classification/predictor models. Then, we formulate the true and generalization errors of the model for both training and validation/test instances where we make use of the Stein's Unbiased Risk Estimator (SURE). We define overfitting, underfitting, and generalization using the obtained true and generalization errors. We introduce cross validation and two well-known examples which are $K$-fold and leave-one-out cross validations. We briefly introduce generalized cross validation and then move on to regularization where we use the SURE again. We work on both $\ell_2$ and $\ell_1$ norm regularizations. Then, we show that bootstrap aggregating (bagging) reduces the variance of estimation. Boosting, specifically AdaBoost, is introduced and it is explained as both an additive model and a maximum margin model, i.e., Support Vector Machine (SVM). The upper bound on the generalization error of boosting is also provided to show why boosting prevents from overfitting. As examples of regularization, the theory of ridge and lasso regressions, weight decay, noise injection to input/weights, and early stopping are explained. Random forest, dropout, histogram of oriented gradients, and single shot multi-box detector are explained as examples of bagging in machine learning and computer vision. Finally, boosting tree and SVM models are mentioned as examples of boosting.

artificial intelligence, estimation, machine learning, (15 more...)

1905.12787

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)
(2 more...)

Wang, Yizhen, Chaudhuri, Kamalika

An Investigation of Data Poisoning Defenses for Online Learning

Machine learning is increasingly used in safety-critical applications, and hence designing machine learning algorithms in the presence of an adversary has been a topic of active research [2, 3, 4, 5, 11, 12, 13]. A style of adversary that is commonly studied is data poisoning attacks [4, 12, 15, 21] where the adversary can modify or corrupt a small fraction of training examples with the goal of forcing the trained classifier to have low classification accuracy. Such attacks have threatened many real-world applications including spam filters [23], malware detection [25], sentiment analysis [24] and collaborative filtering [15]. There has been a body of prior work on data poisoning with increasingly sophisticated attacks and defenses [4, 12, 15, 21, 22, 27, 29, 30]. However, the literature largely suffers from two main limitations. First, most work is on the batch setting - all data is provided in advance and the adversary assumes that the learner's goal is to produce an empirical minimizer of a loss. This excludes many modern machine learning algorithms, such as, stochastic gradient descent, or learning from a data stream.

accuracy, artificial intelligence, machine learning, (15 more...)

1905.12121

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Zhu, Guangyu, Zhao, Tingting

Deep-gKnock: nonlinear group-feature selection with deep neural network

arXiv.org Machine LearningMay-27-2019

Feature selection is central to contemporary high-dimensional data analysis. Grouping structure among features arises naturally in various scientific problems. Many methods have been proposed to incorporate the grouping structure information into feature selection. However, these methods are normally restricted to a linear regression setting. To relax the linear constraint, we combine the deep neural networks (DNNs) with the recent Knockoffs technique, which has been successful in an individual feature selection context. We propose Deep-gKnock (Deep group-feature selection using Knockoffs) as a methodology for model interpretation and dimension reduction. Deep-gKnock performs model-free group-feature selection by controlling group-wise False Discovery Rate (gFDR). Our method improves the interpretability and reproducibility of DNNs. Experimental results on both synthetic and real data demonstrate that our method achieves superior power and accurate gFDR control compared with state-of-the-art methods.

artificial intelligence, machine learning, selection, (19 more...)

1905.10013

Country:

North America > United States > Rhode Island (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)