AITopics | Mokhtari, Aryan

Plotting

Mokhtari, Aryan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization

Mokhtari, Aryan, Ribeiro, Alejandro

Neural Information Processing SystemsDec-31-2017

This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solve the ERM problem corresponding to the full dataset directly, adaptive sample size schemes start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically -- e.g., scaling by a factor of two -- and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. The gains are specific to the choice of method. When particularized to, e.g., accelerated gradient descent and stochastic variance reduce gradient, the computational cost advantage is a logarithm of the number of training samples. Numerical experiments on various datasets confirm theoretical claims and showcase the gains of using the proposed adaptive sample size scheme.

artificial intelligence, machine learning, statistical accuracy, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.14)
North America > Canada > Quebec (0.14)
North America > Canada > British Columbia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Eisen, Mark, Mokhtari, Aryan, Ribeiro, Alejandro

arXiv.org Machine LearningMay-22-2017

We consider large scale empirical risk minimization (ERM) problems, where both the problem dimension and variable size is large. In these cases, most second order methods are infeasible due to the high cost in both computing the Hessian over all samples and computing its inverse in high dimensions. In this paper, we propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. We show that while we geometrically increase the size of the training set at each stage, a single iteration of the truncated Newton method is sufficient to solve the new ERM within its statistical accuracy. Moreover, for a large number of samples we are allowed to double the size of the training set at each stage, and the proposed method subsequently reaches the statistical accuracy of the full training set approximately after two effective passes. In addition to this theoretical result, we show empirically on a number of well known data sets that the proposed truncated adaptive sample size algorithm outperforms stochastic alternatives for solving ERM problems.

artificial intelligence, optimization problem, statistical accuracy, (17 more...)

arXiv.org Machine Learning

1705.07957

Country:

North America > Canada (0.68)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Stochastic Averaging for Constrained Optimization with Application to Online Resource Allocation

Chen, Tianyi, Mokhtari, Aryan, Wang, Xin, Ribeiro, Alejandro, Giannakis, Georgios B.

arXiv.org Machine LearningFeb-26-2017

Existing approaches to resource allocation for nowadays stochastic networks are challenged to meet fast convergence and tolerable delay requirements. The present paper leverages online learning advances to facilitate stochastic resource allocation tasks. By recognizing the central role of Lagrange multipliers, the underlying constrained optimization problem is formulated as a machine learning task involving both training and operational modes, with the goal of learning the sought multipliers in a fast and efficient manner. To this end, an order-optimal offline learning approach is developed first for batch training, and it is then generalized to the online setting with a procedure termed learn-and-adapt. The novel resource allocation protocol permeates benefits of stochastic approximation and statistical learning to obtain low-complexity online updates with learning errors close to the statistical accuracy limits, while still preserving adaptation performance, which in the stochastic network optimization context guarantees queue stability. Analysis and simulated tests demonstrate that the proposed data-driven approach improves the delay and convergence performance of existing resource allocation schemes.

computer based training, educational technology, saga, (21 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2017.2679690

1610.02143

Country:

Europe (1.00)
North America > United States > Pennsylvania (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry:

Energy > Power Industry (0.46)
Energy > Renewable (0.46)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Mokhtari, Aryan, Daneshmand, Hadi, Lucchi, Aurelien, Hofmann, Thomas, Ribeiro, Alejandro

Neural Information Processing SystemsDec-31-2016

We consider empirical risk minimization for large-scale datasets. We introduce Ada Newton as an adaptive algorithm that uses Newton's method with adaptive sample sizes. The main idea of Ada Newton is to increase the size of the training set by a factor larger than one in a way that the minimization variable for the current training set is in the local neighborhood of the optimal argument of the next training set. This allows to exploit the quadratic convergence property of Newton's method and reach the statistical accuracy of each training set with only one iteration of Newton's method. We show theoretically that we can iteratively increase the sample size while applying single Newton iterations without line search and staying within the statistical accuracy of the regularized empirical risk. In particular, we can double the size of the training set in each iteration when the number of samples is sufficiently large. Numerical experiments on various datasets confirm the possibility of increasing the sample size by factor 2 at each iteration which implies that Ada Newton achieves the statistical accuracy of the full training set with about two passes over the dataset.

artificial intelligence, optimization problem, statistical accuracy, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.68)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning

Mokhtari, Aryan, Koppel, Alec, Ribeiro, Alejandro

arXiv.org Machine LearningJun-15-2016

We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple parallel processors to operate on a randomly chosen subset of blocks of the feature vector. We call the algorithm stochastic because processors choose training subsets uniformly at random. Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both the selection of blocks and the selection of elements of the training set. In RAPSA, processors utilize the randomly chosen functions to compute the stochastic gradient component associated with a randomly chosen block. The technical contribution of this paper is to show that this minimally coordinated algorithm converges to the optimal classifier when the training objective is convex. Moreover, we present an accelerated version of RAPSA (ARAPSA) that incorporates the objective function curvature information by premultiplying the descent direction by a Hessian approximation matrix. We further extend the results for asynchronous settings and show that if the processors perform their updates without any coordination the algorithms are still convergent to the optimal argument. RAPSA and its extensions are then numerically evaluated on a linear estimation problem and a binary image classification task using the MNIST handwritten digit dataset.

artificial intelligence, machine learning, rapsa, (15 more...)

arXiv.org Machine Learning

1606.04991

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (1.00)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Global Convergence of Online Limited Memory BFGS

Mokhtari, Aryan, Ribeiro, Alejandro

arXiv.org Machine LearningSep-6-2014

Global convergence of an online (stochastic) limited memory version of the Broyden-Fletcher- Goldfarb-Shanno (BFGS) quasi-Newton method for solving optimization problems with stochastic objectives that arise in large scale machine learning is established. Lower and upper bounds on the Hessian eigenvalues of the sample functions are shown to suffice to guarantee that the curvature approximation matrices have bounded determinants and traces, which, in turn, permits establishing convergence to optimal arguments with probability 1. Numerical experiments on support vector machines with synthetic data showcase reductions in convergence time relative to stochastic gradient descent algorithms as well as reductions in storage and computation relative to other online quasi-Newton methods. Experimental evaluation on a search engine advertising problem corroborates that these advantages also manifest in practical applications.

artificial intelligence, eigenvalue, machine learning, (16 more...)

arXiv.org Machine Learning

1409.2045

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

RES: Regularized Stochastic BFGS Algorithm

Mokhtari, Aryan, Ribeiro, Alejandro

arXiv.org Machine LearningJan-29-2014

RES, a regularized stochastic version of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method is proposed to solve convex optimization problems with stochastic objectives. The use of stochastic gradient descent algorithms is widespread, but the number of iterations required to approximate optimal arguments can be prohibitive in high dimensional problems. Application of second order methods, on the other hand, is impracticable because computation of objective function Hessian inverses incurs excessive computational cost. BFGS modifies gradient descent by introducing a Hessian approximation matrix computed from finite gradient differences. RES utilizes stochastic gradients in lieu of deterministic gradients for both, the determination of descent directions and the approximation of the objective function's curvature. Since stochastic gradients can be computed at manageable computational cost RES is realizable and retains the convergence rate advantages of its deterministic counterparts. Convergence results show that lower and upper bounds on the Hessian egeinvalues of the sample functions are sufficient to guarantee convergence to optimal arguments. Numerical experiments showcase reductions in convergence time relative to stochastic gradient descent algorithms and non-regularized stochastic versions of BFGS. An application of RES to the implementation of support vector machines is developed.

artificial intelligence, eigenvalue, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2014.2357775

1401.7625

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback