AITopics

1009.0571

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

arXiv.org Machine LearningJun-16-2011

Online and Batch Learning Algorithms for Data with Missing Features

Rostamizadeh, Afshin, Agarwal, Alekh, Bartlett, Peter

We introduce new online and batch algorithms that are robust to data with missing features, a situation that arises in many practical applications. In the online setup, we allow for the comparison hypothesis to change as a function of the subset of features that is observed on any given round, extending the standard setting where the comparison hypothesis is fixed throughout. In the batch setup, we present a convex relation of a non-convex problem to jointly estimate an imputation function, used to fill in the values of missing features, along with the classification hypothesis. We prove regret bounds in the online setting and Rademacher complexity bounds for the batch i.i.d. setting. The algorithms are tested on several UCI datasets, showing superior performance over baselines.

artificial intelligence, corruption, optimization problem, (13 more...)

1104.0729

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningApr-28-2011

Distributed Delayed Stochastic Optimization

Agarwal, Alekh, Duchi, John C.

We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. In application to distributed optimization, we develop procedures that overcome communication bottlenecks and synchronization requirements. We show $n$-node architectures whose optimization error in stochastic problems---in spite of asynchronous delays---scales asymptotically as $\order(1 / \sqrt{nT})$ after $T$ iterations. This rate is known to be optimal for a distributed system with $n$ nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a statistical machine learning task.

algorithm, artificial intelligence, optimization problem, (17 more...)

1104.5525

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Machine LearningApr-10-2011

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

Duchi, John, Agarwal, Alekh, Wainwright, Martin

The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent co-ordination, estimation in sensor networks, and large-scale optimization in machine learning. We develop and analyze distributed algorithms based on dual averaging of subgradients, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our method of analysis allows for a clear separation between the convergence of the optimization algorithm itself and the effects of communication constraints arising from the network structure. In particular, we show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. Our approach includes both the cases of deterministic optimization and communication, as well as problems with stochastic optimization and/or communication.

algorithm, artificial intelligence, optimization problem, (15 more...)

doi: 10.1109/TAC.2011.2161027

1005.2012

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsDec-31-2010

Fast global convergence rates of gradient methods for high-dimensional statistical recovery

Agarwal, Alekh, Negahban, Sahand, Wainwright, Martin J.

Many statistical $M$-estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer. We analyze the convergence rates of first-order gradient methods for solving such problems within a high-dimensional framework that allows the data dimension $d$ to grow with (and possibly exceed) the sample size $n$. This high-dimensional structure precludes the usual global assumptions---namely, strong convexity and smoothness conditions---that underlie classical optimization analysis. We define appropriately restricted versions of these conditions, and show that they are satisfied with high probability for various statistical models. Under these conditions, our theory guarantees that Nesterov's first-order method~\cite{Nesterov07} has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical Euclidean distance between the true unknown parameter $\theta^*$ and the optimal solution $\widehat{\theta}$. This globally linear rate is substantially faster than previous analyses of global convergence for specific methods that yielded only sublinear rates. Our analysis applies to a wide range of $M$-estimators and statistical models, including sparse linear regression using Lasso ($\ell_1$-regularized regression), group Lasso, block sparsity, and low-rank matrix recovery using nuclear norm regularization. Overall, this result reveals an interesting connection between statistical precision and computational efficiency in high-dimensional estimation.

artificial intelligence, convergence, optimization problem, (19 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Neural Information Processing SystemsDec-31-2010

Distributed Dual Averaging In Networks

Agarwal, Alekh, Wainwright, Martin J., Duchi, John C.

The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks.

algorithm, artificial intelligence, optimization problem, (14 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

arXiv.org Machine LearningMar-10-2010

Optimal Allocation Strategies for the Dark Pool Problem

Agarwal, Alekh, Bartlett, Peter, Dama, Max

We study the problem of allocating stocks to dark pools. We propose and analyze an optimal approach for allocations, if continuous-valued allocations are allowed. We also propose a modification for the case when only integer-valued allocations are possible. We extend the previous work on this problem to adversarial scenarios, while also improving on their results in the iid setup. The resulting algorithms are efficient, and perform well in simulations under stochastic and adversarial inputs.

algorithm, artificial intelligence, planning & scheduling, (18 more...)

1003.2245

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.40)

Neural Information Processing SystemsDec-31-2009

Information-theoretic lower bounds on the oracle complexity of convex optimization

Agarwal, Alekh, Wainwright, Martin J., Bartlett, Peter L., Ravikumar, Pradeep K.

Despite the large amount of literature on upper bounds on complexity of convex analysis, surprisingly little is known about the fundamental hardness of these problems. The extensive use of convex optimization in machine learning and statistics makes such an understanding critical to understand fundamental computational limits of learning and estimation. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for some function classes. We also discuss implications of these results to the understanding the inherent complexity of large-scale learning and estimation problems.

artificial intelligence, optimization, optimization problem, (19 more...)

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

arXiv.org Machine LearningMar-30-2009

A Stochastic View of Optimal Regret through Minimax Duality

Abernethy, Jacob, Agarwal, Alekh, Bartlett, Peter L., Rakhlin, Alexander

We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional--the minimizer over the player's actions of expected loss--defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary.

artificial intelligence, assumption, educational setting, (20 more...)

0903.5328

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsDec-31-2008

An Analysis of Inference with the Universum

Chapelle, Olivier, Agarwal, Alekh, Sinz, Fabian H., Schölkopf, Bernhard

These assumptions are usually encoded in the regulariser or the prior. A generic learning algorithm usually makes rather weak assumptions about the regularities underlying the data.

artificial intelligence, health & medicine, universum, (18 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)