AITopics | Schmidt, Mark

Collaborating Authors

Schmidt, Mark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Minimizing Finite Sums with the Stochastic Average Gradient

Schmidt, Mark, Roux, Nicolas Le, Bach, Francis

arXiv.org Machine LearningMay-11-2016

We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

air transportation, convergence rate, optimization problem, (19 more...)

arXiv.org Machine Learning

1309.2388

Country:

North America > Canada (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Air (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Influence Maximization with Bandits

Vaswani, Sharan, Lakshmanan, Laks. V. S., Schmidt, Mark

arXiv.org Machine LearningApr-27-2016

We consider the problem of \emph{influence maximization}, the problem of maximizing the number of people that become aware of a product by finding the `best' set of `seed' users to expose the product to. Most prior work on this topic assumes that we know the probability of each user influencing each other user, or we have data that lets us estimate these influences. However, this information is typically not initially available or is difficult to obtain. To avoid this assumption, we adopt a combinatorial multi-armed bandit paradigm that estimates the influence probabilities as we sequentially try different seed sets. We establish bounds on the performance of this procedure under the existing edge-level feedback as well as a novel and more realistic node-level feedback. Beyond our theoretical results, we describe a practical implementation and experimentally demonstrate its efficiency and effectiveness on four real datasets.

big data, optimization problem, probability, (19 more...)

arXiv.org Machine Learning

1503.00024

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

StopWasting My Gradients: Practical SVRG

Harikandeh, Reza, Ahmed, Mohamed Osama, Virani, Alim, Schmidt, Mark, Konečný, Jakub, Sallinen, Scott

Neural Information Processing SystemsDec-31-2015

We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors in the control variate, and use this to derive variants of SVRG that use growing-batch strategies to reduce the number of gradient calculations required in the early iterations. We further (i) show how to exploit support vectors to reduce thenumber of gradient computations in the later iterations, (ii) prove that the commonly-used regularized SVRG iteration is justified and improves the convergence rate,(iii) consider alternate mini-batch selection strategies, and (iv) consider the generalization error of the method.

artificial intelligence, convergence rate, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Stop Wasting My Gradients: Practical SVRG

Babanezhad, Reza, Ahmed, Mohamed Osama, Virani, Alim, Schmidt, Mark, Konečný, Jakub, Sallinen, Scott

arXiv.org Machine LearningNov-5-2015

We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors in the control variate, and use this to derive variants of SVRG that use growing-batch strategies to reduce the number of gradient calculations required in the early iterations. We further (i) show how to exploit support vectors to reduce the number of gradient computations in the later iterations, (ii) prove that the commonly-used regularized SVRG iteration is justified and improves the convergence rate, (iii) consider alternate mini-batch selection strategies, and (iv) consider the generalization error of the method.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

1511.01942

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.35)

Add feedback

Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection

Nutini, Julie, Schmidt, Mark, Laradji, Issam H., Friedlander, Michael, Koepke, Hoyt

arXiv.org Machine LearningJun-1-2015

There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that---except in extreme cases---it's convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a Gauss-Southwell-Lipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate Gauss-Southwell rules, and (iv) analyze proximal-gradient variants of the Gauss-Southwell rule.

artificial intelligence, convergence rate, machine learning, (15 more...)

arXiv.org Machine Learning

1506.00552

Country:

North America > United States > California (0.14)
North America > Canada (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

Schmidt, Mark, Babanezhad, Reza, Ahmed, Mohamed Osama, Defazio, Aaron, Clifton, Ann, Sarkar, Anoop

arXiv.org Machine LearningApr-16-2015

We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under non-uniform sampling. Our experimental results reveal that our method often significantly outperforms existing methods in terms of the training objective, and performs as well or better than optimally-tuned stochastic gradient methods in terms of test error.

algorithm, artificial intelligence, natural language, (15 more...)

arXiv.org Machine Learning

1504.04406

Country: North America > Canada (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.71)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Convex Optimization for Big Data

Cevher, Volkan, Becker, Stephen, Schmidt, Mark

arXiv.org Machine LearningNov-4-2014

This article reviews recent advances in convex optimization algorithms for Big Data, which aim to reduce the computational, storage, and communications bottlenecks. We provide an overview of this emerging field, describe contemporary approximation techniques like first-order methods and randomization for scalability, and survey the important role of parallel and distributed computation. The new Big Data algorithms are based on surprisingly simple principles and attain staggering accelerations even on classical problems.

algorithm, optimization problem, survey article, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/MSP.2014.2329397

1411.0972

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Hybrid Deterministic-Stochastic Methods for Data Fitting

Friedlander, Michael P., Schmidt, Mark

arXiv.org Machine LearningFeb-9-2013

Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum. These methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.

artificial intelligence, convergence rate, optimization problem, (19 more...)

arXiv.org Machine Learning

doi: 10.1137/110830629

1104.2373

Country:

North America > Canada (0.14)
Europe > Portugal (0.14)

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

Roux, Nicolas L., Schmidt, Mark, Bach, Francis R.

Neural Information Processing SystemsDec-31-2012

We propose a new stochastic gradient method for optimizing the sum of  a finite set of smooth functions, where the sum is strongly convex.  While standard stochastic gradient methods  converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence  rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard  algorithms, both in terms of optimizing the training error and reducing the test error quickly.

artificial intelligence, convergence rate, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)

Add feedback

A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

Lacoste-Julien, Simon, Schmidt, Mark, Bach, Francis

arXiv.org Machine LearningDec-20-2012

In this note, we present a new averaging technique for the projected stochastic subgradient method. By using a weighted average with a weight of t+1 for each iterate w_t at iteration t, we obtain the convergence rate of O(1/t) with both an easy proof and an easy implementation. The new scheme is compared empirically to existing techniques, with similar performance behavior.

artificial intelligence, iterate, machine learning, (16 more...)

arXiv.org Machine Learning

1212.2002

Country: Europe > France (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback