AITopics

Country: North America (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Neural Information Processing SystemsDec-31-2013

Learning with Noisy Labels

Natarajan, Nagarajan, Dhillon, Inderjit S., Ravikumar, Pradeep K., Tewari, Ambuj

In this paper, we theoretically study the problem of binary classification in the presence of random classification noise --- the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is \emph{class-conditional} --- the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classification with weighted 0-1 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence --- methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve over 88\% accuracy even when 40\% of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets.

artificial intelligence, noise, survey article, (17 more...)

Country:

North America > United States > Texas (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Neural Information Processing SystemsDec-31-2012

Feature Clustering for Accelerating Parallel Coordinate Descent

Scherrer, Chad, Tewari, Ambuj, Halappanavar, Mahantesh, Haglin, David

Large-scale 1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, includingclassification and regression problems. High-performance algorithms andimplementations are critical to efficiently solving these problems. Building upon previous work on coordinate descent algorithms for 1-regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descentthat includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a unified convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered sothat the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental results usingdata from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the field, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale 1-regularization problems.

algorithm, artificial intelligence, machine learning, (14 more...)

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.47)

Industry: Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)

arXiv.org Machine LearningDec-17-2012

Feature Clustering for Accelerating Parallel Coordinate Descent

Scherrer, Chad, Tewari, Ambuj, Halappanavar, Mahantesh, Haglin, David

Large-scale L1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. High-performance algorithms and implementations are critical to efficiently solving these problems. Building upon previous work on coordinate descent algorithms for L1-regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descent that includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a unified convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered so that the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental re- sults using data from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the field, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale L1-regularization problems.

algorithm, artificial intelligence, machine learning, (16 more...)

1212.4174

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (1.00)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)

arXiv.org Machine LearningNov-26-2012

The Interplay Between Stability and Regret in Online Learning

Saha, Ankan, Jain, Prateek, Tewari, Ambuj

This paper considers the stability of online learning algorithms and its implications for learnability (bounded regret). We introduce a novel quantity called {\em forward regret} that intuitively measures how good an online learning algorithm is if it is allowed a one-step look-ahead into the future. We show that given stability, bounded forward regret is equivalent to bounded regret. We also show that the existence of an algorithm with bounded regret implies the existence of a stable algorithm with bounded regret and bounded forward regret. The equivalence results apply to general, possibly non-convex problems. To the best of our knowledge, our analysis provides the first general connection between stability and regret in the online setting that is not restricted to a particular class of algorithms. Our stability-regret connection provides a simple recipe for analyzing regret incurred by any online learning algorithm. Using our framework, we analyze several existing online learning algorithms as well as the "approximate" versions of algorithms like RDA that solve an optimization problem at each iteration. Our proofs are simpler than existing analysis for the respective algorithms, show a clear trade-off between stability and forward regret, and provide tighter regret bounds in some cases. Furthermore, using our recipe, we analyze "approximate" versions of several algorithms such as follow-the-regularized-leader (FTRL) that requires solving an optimization problem at each step.

algorithm, computer based training, educational technology, (20 more...)

1211.6158

Country:

North America > United States > New York (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Research Report (0.50)
Workflow (0.49)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-27-2012

Scaling Up Coordinate Descent Algorithms for Large $\ell_1$ Regularization Problems

Scherrer, Chad, Halappanavar, Mahantesh, Tewari, Ambuj, Haglin, David

We present a generic framework for parallel coordinate descent (CD) algorithms that includes, as special cases, the original sequential algorithms Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.

algorithm, artificial intelligence, optimization problem, (17 more...)

1206.6409

Country:

North America > United States > Texas (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-27-2012

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

Arora, Raman, Dekel, Ofer, Tewari, Ambuj

Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm's ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm's actions. We define the alternative notion of policy regret, which attempts to provide a more meaningful way to measure an online algorithm's performance against adaptive adversaries. Focusing on the online bandit setting, we show that no bandit algorithm can guarantee a sublinear policy regret against an adaptive adversary with unbounded memory. On the other hand, if the adversary's memory is bounded, we present a general technique that converts any bandit algorithm with a sublinear regret bound into an algorithm with a sublinear policy regret bound. We extend this result to other variants of regret, such as switching regret, internal regret, and swap regret.

adversary, artificial intelligence, big data, (18 more...)

1206.64

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre:

Research Report (0.64)
Instructional Material > Online (0.40)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.56)

Neural Information Processing SystemsDec-31-2011

Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Rakhlin, Alexander, Sridharan, Karthik, Tewari, Ambuj

Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario whereby at every time step the worst instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable.

adversary, computer based training, educational technology, (21 more...)

Country: North America > United States > Texas (0.14)

Industry: Education > Educational Setting > Online (0.63)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

Neural Information Processing SystemsDec-31-2011

On the Universality of Online Mirror Descent

Srebro, Nati, Sridharan, Karthik, Tewari, Ambuj

We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee.

artificial intelligence, machine learning, mirror descent, (17 more...)

Country:

North America > United States > Texas (0.14)
Asia > Middle East > Israel (0.14)

Industry: Education (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsDec-31-2011

Orthogonal Matching Pursuit with Replacement

Jain, Prateek, Tewari, Ambuj, Dhillon, Inderjit S.

In this paper, we consider the problem of compressed sensing where the goal is to recover almost all the sparse vectors using a small number of fixed linear measurements. For this problem, we propose a novel partial hard-thresholding operator leading to a general family of iterative algorithms. While one extreme of the family yields well known hard thresholding algorithms like ITI and HTP, the other end of the spectrum leads to a novel algorithm that we call Orthogonal Matching Pursuit with Replacement (OMPR). OMPR, like the classic greedy algorithm OMP, adds exactly one coordinate to the support at each iteration, based on the correlation with the current residual. However, unlike OMP, OMPR also removes one coordinate from the support. This simple change allows us to prove the best known guarantees for OMPR in terms of the Restricted Isometry Property (a condition on the measurement matrix). In contrast, OMP is known to have very weak performance guarantees under RIP. We also extend OMPR using locality sensitive hashing to get OMPR-Hash, the first provably sub-linear (in dimensionality) algorithm for sparse recovery. Our proof techniques are novel and flexible enough to also permit the tightest known analysis of popular iterative algorithms such as CoSaMP and Subspace Pursuit. We provide experimental results on large problems providing recovery for vectors of size up to million dimensions. We demonstrate that for large-scale problems our proposed methods are more robust and faster than the existing methods.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States > Texas > Travis County > Austin (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)