AITopics

1705.09319

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

arXiv.org Machine LearningJan-17-2017

Towards Principled Methods for Training Generative Adversarial Networks

Arjovsky, Martin, Bottou, Léon

The goal of this paper is not to introduce a single algorithm or method, but to make theoretical steps towards fully understanding the training dynamics of generative adversarial networks. In order to substantiate our theoretical analysis, we perform targeted experiments to verify our assumptions, illustrate our claims, and quantify the phenomena. This paper is divided into three sections. The first section introduces the problem at hand. The second section is dedicated to studying and proving rigorously the problems including instability and saturation that arize when training generative adversarial networks. The third section examines a practical and theoretically grounded direction towards solving these problems, while introducing new tools to study them.

artificial intelligence, manifold, neural network, (19 more...)

1701.04862

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningApr-11-2016

No Regret Bound for Extreme Bandits

Nishihara, Robert, Lopez-Paz, David, Bottou, Léon

Algorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with sequentially choosing which distribution from a collection to sample in order to minimize (maximize) the single best cost (reward). Whereas the distributions in the standard bandit setting are primarily characterized by their means, a number of subtleties arise when we care about the minimal cost as opposed to the average cost. For example, there may not be a well-defined "best" distribution as there is in the standard bandit setting. The best distribution depends on the rewards that have been obtained and on the remaining time horizon. Whereas in the standard bandit setting, it is sensible to compare policies with an oracle which plays the single best arm, in the extreme bandit setting, there are multiple sensible oracle models. We define a sensible notion of "extreme regret" in the extreme bandit setting, which parallels the concept of regret in the standard bandit setting. We then prove that no policy can asymptotically achieve no extreme regret.

artificial intelligence, big data, time horizon, (19 more...)

1508.02933

Country: Europe > Spain (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.30)

arXiv.org Machine LearningFeb-25-2016

Unifying distillation and privileged information

Lopez-Paz, David, Bottou, Léon, Schölkopf, Bernhard, Vapnik, Vladimir

Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, extend it to unsupervised, semisupervised and multitask learning scenarios, and illustrate its efficacy on a variety of numerical simulations on both synthetic and real-world data.

artificial intelligence, machine learning, privileged information, (16 more...)

1511.03643

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceJul-27-2013

Counterfactual Reasoning and Learning Systems

Bottou, Léon, Peters, Jonas, Quiñonero-Candela, Joaquin, Charles, Denis X., Chickering, D. Max, Portugaly, Elon, Ray, Dipankar, Simard, Patrice, Snelson, Ed

This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work is illustrated by experiments carried out on the ad placement system associated with the Bing search engine.

advertiser, health & medicine, information management, (18 more...)

arXiv.org Artificial Intelligence

1209.2355

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Washington > King County (0.14)
Europe > United Kingdom > England (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Industry:

Health & Medicine (1.00)
Marketing (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Neural Information Processing SystemsDec-31-2008

The Tradeoffs of Large Scale Learning

Bottou, Léon, Bousquet, Olivier

This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation--estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways.

algorithm, artificial intelligence, optimization problem, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New Jersey (0.14)

Industry: Education > Focused Education > Special Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsDec-31-2005

Parallel Support Vector Machines: The Cascade SVM

Graf, Hans P., Cosatto, Eric, Bottou, Léon, Dourdanovic, Igor, Vapnik, Vladimir

We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a'Cascade' of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes th rough the Cascade, but already a single pass provides good generalization. A single pass is 5x - 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week.

artificial intelligence, machine learning, vector, (18 more...)

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Neural Information Processing SystemsDec-31-2005

Breaking SVM Complexity with Cross-Training

Bottou, Léon, Weston, Jason, Bakir, Gökhan H.

We propose to selectively remove examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler, 1982). This heuristic procedure aims at creating a separable distribution of training examples with minimal impact on the position of the decision boundary. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity of SVMs during both the training and prediction stages.

health & medicine, inductive learning, training set, (20 more...)

Country:

North America > United States (0.69)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Industry: Health & Medicine > Consumer Health (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.73)

Neural Information Processing SystemsDec-31-2005

Parallel Support Vector Machines: The Cascade SVM

Graf, Hans P., Cosatto, Eric, Bottou, Léon, Dourdanovic, Igor, Vapnik, Vladimir

We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a'Cascade' of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes through the Cascade, but already a single pass provides good generalization. A single pass is 5x - 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week.

artificial intelligence, machine learning, vector, (18 more...)

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Neural Information Processing SystemsDec-31-2004

Geometric Clustering Using the Information Bottleneck Method

Still, Susanne, Bialek, William, Bottou, Léon

We argue that K-means and deterministic annealing algorithms for geometric clustering can be derived from the more general Information Bottleneck approach. If we cluster the identities of data points to preserve information about their location, the set of optimal solutions is massively degenerate. But if we treat the equations that define the optimal solution as an iterative algorithm, then a set of "smooth" initial conditions selects solutions with the desired geometrical properties. In addition to conceptual unification, we argue that this approach can be more efficient and robust than classic algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)