AITopics

2006.07862

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy (0.04)
(2 more...)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Zaki, Mohammadi, Mohan, Avi, Gopalan, Aditya

Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

arXiv.org Machine LearningJun-13-2020

We study the problem of best arm identification in linearly parameterised multi-armed bandits. Given a set of feature vectors $\mathcal{X}\subset\mathbb{R}^d,$ a confidence parameter $\delta$ and an unknown vector $\theta^*,$ the goal is to identify $\arg\max_{x\in\mathcal{X}}x^T\theta^*$, with probability at least $1-\delta,$ using noisy measurements of the form $x^T\theta^*.$ For this fixed confidence ($\delta$-PAC) setting, we propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem. Previous approaches rely on access to minimax optimization oracles. The algorithm, which we call the \textit{Phased Elimination Linear Exploration Game} (PELEG), maintains a high-probability confidence ellipsoid containing $\theta^*$ in each round and uses it to eliminate suboptimal arms in phases. PELEG achieves fast shrinkage of this confidence ellipsoid along the most confusing (i.e., close to, but not optimal) directions by interpreting the problem as a two player zero-sum game, and sequentially converging to its saddle point using low-regret learners to compute players' strategies in each round. We analyze the sample complexity of PELEG and show that it matches, up to order, an instance-dependent lower bound on sample complexity in the linear bandit setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees.

algorithm, bandit, sample complexity, (13 more...)

2006.07562

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.34)

Elnabarawy, Islam, Arroyo, Kristijana, Wunsch, Donald C. II

StarCraft II Build Order Optimization using Deep Reinforcement Learning and Monte-Carlo Tree Search

arXiv.org Artificial IntelligenceJun-12-2020

The real-time strategy game of StarCraft II has been posed as a challenge for reinforcement learning by Google's DeepMind. This study examines the use of an agent based on the Monte-Carlo Tree Search algorithm for optimizing the build order in StarCraft II, and discusses how its performance can be improved even further by combining it with a deep reinforcement learning neural network. The experimental results accomplished using Monte-Carlo Tree Search achieves a score similar to a novice human player by only using very limited time and computational resources, which paves the way to achieving scores comparable to those of a human expert by combining it with the use of deep reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2006.10525

Country: North America > United States > Missouri > Phelps County > Rolla (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Geada, Rob, Prangle, Dennis, McGough, Andrew Stephen

Bonsai-Net: One-Shot Neural Architecture Search via Differentiable Pruners

One-shot Neural Architecture Search (NAS) aims to minimize the computational expense of discovering state-of-the-art models. However, in the past year attention has been drawn to the comparable performance of naive random search across the same search spaces used by leading NAS algorithms. To address this, we explore the effects of drastically relaxing the NAS search space, and we present Bonsai-Net, an efficient one-shot NAS method to explore our relaxed search space. Bonsai-Net is built around a modified differential pruner and can consistently discover state-of-the-art architectures that are significantly better than random search with fewer parameters than other state-of-the-art methods. Additionally, Bonsai-Net performs simultaneous model search and training, dramatically reducing the total time it takes to generate fully-trained models from scratch.

bonsai-net, opération, search space, (13 more...)

2006.09264

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Tang, Yunhao, Choromanski, Krzysztof

Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies

Off-policy learning algorithms have been known to be sensitive to the choice of hyper-parameters. However, unlike near on-policy algorithms for which hyper-parameters could be optimized via e.g. meta-gradients, similar techniques could not be straightforwardly applied to off-policy learning. In this work, we propose a framework which entails the application of Evolutionary Strategies to online hyper-parameter tuning in off-policy learning. Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces. We show that our method outperforms state-of-the-art off-policy learning baselines with static hyper-parameters and recent prior work over a wide range of continuous control benchmarks.

arxiv preprint arxiv, evolutionary algorithm, machine learning, (16 more...)

2006.07554

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Suggala, Arun Sai, Netrapalli, Praneeth

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

We consider the problem of online learning and its application to solving minimax games. For the online learning problem, Follow the Perturbed Leader (FTPL) is a widely studied algorithm which enjoys the optimal $O(T^{1/2})$ worst-case regret guarantee for both convex and nonconvex losses. In this work, we show that when the sequence of loss functions is predictable, a simple modification of FTPL which incorporates optimism can achieve better regret guarantees, while retaining the optimal worst-case regret guarantee for unpredictable sequences. A key challenge in obtaining these tighter regret bounds is the stochasticity and optimism in the algorithm, which requires different analysis techniques than those commonly used in the analysis of FTPL. The key ingredient we utilize in our analysis is the dual view of perturbation as regularization. While our algorithm has several applications, we consider the specific application of minimax games. For solving smooth convex-concave games, our algorithm only requires access to a linear optimization oracle. For Lipschitz and smooth nonconvex-nonconcave games, our algorithm requires access to an optimization oracle which computes the perturbed best response. In both these settings, our algorithm solves the game up to an accuracy of $O(T^{-1/2})$ using $T$ calls to the optimization oracle. An important feature of our algorithm is that it is highly parallelizable and requires only $O(T^{1/2})$ iterations, with each iteration making $O(T^{1/2})$ parallel calls to the optimization oracle.

algorithm, artificial intelligence, machine learning, (17 more...)

2006.07541

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Education (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)

Dikkala, Nishanth, Lewis, Greg, Mackey, Lester, Syrgkanis, Vasilis

Minimax Estimation of Conditional Moment Models

We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary who identifies violating moments over a test function space. We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces, with respect to an appropriate analogue of the mean squared error metric, for ill-posed inverse problems. We show that when the minimax criterion is regularized with a second moment penalty on the test function and the test function space is sufficiently rich, then the estimation rate scales with the critical radius of the hypothesis and test function spaces, a quantity which typically gives tight fast rates. Our main result follows from a novel localized Rademacher analysis of statistical learning problems defined via minimax objectives. We provide applications of our main results for several hypothesis spaces used in practice such as: reproducing kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined via shape constraints, ensemble estimators such as random forests, and neural networks. For each of these applications we provide computationally efficient optimization methods for solving the corresponding minimax problem (e.g. stochastic first-order heuristics for neural networks). In several applications, we show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates. We conclude with an extensive experimental analysis of the proposed methods.

artificial intelligence, estimator, machine learning, (16 more...)

2006.07201

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Joshi, Chaitanya K., Cappart, Quentin, Rousseau, Louis-Martin, Laurent, Thomas, Bresson, Xavier

Learning TSP Requires Rethinking Generalization

End-to-end training of neural network solvers for combinatorial problems such as the Travelling Salesman Problem is intractable and inefficient beyond a few hundreds of nodes. While state-of-the-art Machine Learning approaches perform closely to classical solvers for trivially small sizes, they are unable to generalize the learnt policy to larger instances of practical scales. Towards leveraging transfer learning to solve large-scale TSPs, this paper identifies inductive biases, model architectures and learning algorithms that promote generalization to instances larger than those seen in training. Our controlled experiments provide the first principled investigation into such zero-shot generalization, revealing that extrapolating beyond training data requires rethinking the entire neural combinatorial optimization pipeline, from network layers and learning paradigms to evaluation protocols.

arxiv preprint arxiv, machine learning, reinforcement learning, (17 more...)

2006.07054

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Singapore (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

arXiv.org Machine LearningJun-11-2020

Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time

Drori, Iddo, Kharkar, Anant, Sickinger, William R., Kates, Brandon, Ma, Qiang, Ge, Suwen, Dolev, Eden, Dietrich, Brenda, Williamson, David P., Udell, Madeleine

Combinatorial optimization algorithms for graph problems are usually designed afresh for each new problem with careful attention by an expert to the problem structure. In this work, we develop a new framework to solve any combinatorial optimization problem over graphs that can be formulated as a single player game defined by states, actions, and rewards, including minimum spanning tree, shortest paths, traveling salesman problem, and vehicle routing problem, without expert knowledge. Our method trains a graph neural network using reinforcement learning on an unlabeled training set of graphs. The trained network then outputs approximate solutions to new graph instances in linear running time. In contrast, previous approximation algorithms or heuristics tailored to NP-hard problems on graphs generally have at least quadratic running time. We demonstrate the applicability of our approach on both polynomial and NP-hard problems with optimality gaps close to 1, and show that our method is able to generalize well: (i) from training on small graphs to testing on large graphs; (ii) from training on random graphs of one type to testing on random graphs of another type; and (iii) from training on random graphs to running on real world graphs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2006.0375

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.67)
Transportation (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Bertsimas, Dimitris, Digalakis, Vassilis Jr

The Backbone Method for Ultra-High Dimensional Sparse Machine Learning

arXiv.org Machine LearningJun-11-2020

We present the backbone method, a generic framework that enables sparse and interpretable supervised machine learning methods to scale to ultra-high dimensional problems. We solve, in minutes, sparse regression problems with $p\sim10^7$ features and decision tree induction problems with $p\sim10^5$ features. The proposed method operates in two phases; we first determine the backbone set, that consists of potentially relevant features, by solving a number of tractable subproblems; then, we solve a reduced problem, considering only the backbone features. Numerical experiments demonstrate that our method competes with optimal solutions, when exact methods apply, and substantially outperforms baseline heuristics, when exact methods do not scale, both in terms of recovering the true relevant features and in its out-of-sample predictive performance.

artificial intelligence, machine learning, optimization problem, (17 more...)

2006.06592

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Monterey County > Monterey (0.04)

Genre:

Research Report > New Finding (0.47)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)