AITopics | Tang, Yunhao

Collaborating Authors

Tang, Yunhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Monte-Carlo Tree Search as Regularized Policy Optimization

Grill, Jean-Bastien, Altché, Florent, Tang, Yunhao, Hubert, Thomas, Valko, Michal, Antonoglou, Ioannis, Munos, Rémi

arXiv.org Machine LearningJul-24-2020

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

muzero, neural network, planning & scheduling, (17 more...)

arXiv.org Machine Learning

2007.12509

Country:

Europe > United Kingdom > England (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies

Tang, Yunhao, Choromanski, Krzysztof

arXiv.org Machine LearningJun-12-2020

Off-policy learning algorithms have been known to be sensitive to the choice of hyper-parameters. However, unlike near on-policy algorithms for which hyper-parameters could be optimized via e.g. meta-gradients, similar techniques could not be straightforwardly applied to off-policy learning. In this work, we propose a framework which entails the application of Evolutionary Strategies to online hyper-parameter tuning in off-policy learning. Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces. We show that our method outperforms state-of-the-art off-policy learning baselines with static hyper-parameters and recent prior work over a wide range of continuous control benchmarks.

artificial intelligence, arxiv preprint arxiv, neural network, (18 more...)

arXiv.org Machine Learning

2006.07554

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Tang, Yunhao, Kucukelbir, Alp

arXiv.org Machine LearningJun-12-2020

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

artificial intelligence, arxiv preprint arxiv, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2006.07549

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

ES-MAML: Simple Hessian-Free Meta Learning

Song, Xingyou, Gao, Wenbo, Yang, Yuxiang, Choromanski, Krzysztof, Pacchiano, Aldo, Tang, Yunhao

arXiv.org Artificial IntelligenceOct-5-2019

Meta-learning is a paradigm in machine learning which aims to develop models and training algorithms which can quickly adapt to new tasks and data. Our focus in this paper is on meta-learning in reinforcement learning (RL), where data efficiency is of paramount importance because gathering new samples often requires costly simulations or interactions with the real world. A popular technique for RL meta-learning is Model Agnostic Meta Learning (MAML) (Finn et al., 2017, 2018), a model for training an agent (the meta-policy) which can quickly adapt to new and unknown tasks by performing one (or a few) gradient updates in the new environment. We provide a formal description of MAML in Section 2. MAML has proven to be successful for many applications. However, implementing and running MAML continues to be challenging. One major complication is that the standard version of MAML requires estimating second derivatives of the RL reward function, which is difficult when using backpropagation on stochastic policies; indeed, the original implementation of MAML (Finn et al., 2017) did so incorrectly, which spurred the development of unbiased higher-order estimators (DiCE, (Foerster et al., 2018)) and further analysis of the credit assignment mechanism in MAML (Rothfuss et al., 2019).

algorithm, neural network, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

1910.01215

Country:

Europe (0.67)
North America > Canada (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

Reinforcement Learning with Chromatic Networks

Song, Xingyou, Choromanski, Krzysztof, Parker-Holder, Jack, Tang, Yunhao, Gao, Wenbo, Pacchiano, Aldo, Sarlos, Tamas, Jain, Deepali, Yang, Yuxiang

arXiv.org Artificial IntelligenceJul-10-2019

We present a new algorithm for finding compact neural networks encoding reinforcement learning (RL) policies. To do it, we leverage in the novel RL setting the theory of pointer networks and ENAS-type algorithms for combinatorial optimization of RL policies as well as recent evolution strategies (ES) optimization methods, and propose to define the combinatorial search space to be the the set of different edge-partitionings (colorings) into same-weight classes. For several RL tasks, we manage to learn colorings translating to effective policies parameterized by as few as 17 weight parameters, providing 6x compression over state-of-the-art compact policies based on Toeplitz matrices. We believe that our work is one of the first attempts to propose a rigorous approach to training structured neural network architectures for RL problems that are of interest especially in mobile robotics with limited storage and computational resources.

architecture, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

1907.06511

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Wasserstein Reinforcement Learning

Pacchiano, Aldo, Parker-Holder, Jack, Tang, Yunhao, Choromanska, Anna, Choromanski, Krzysztof, Jordan, Michael I.

arXiv.org Machine LearningJun-19-2019

We propose behavior-driven optimization via Wasserstein distances (WDs) to improve several classes of state-of-the-art reinforcement learning (RL) algorithms. We show that WD regularizers acting on appropriate policy embeddings efficiently incorporate behavioral characteristics into policy optimization. We demonstrate that they improve Evolution Strategy methods by encouraging more efficient exploration, can be applied in imitation learning and to speed up training of Trust Region Policy Optimization methods. Since the exact computation of WDs is expensive, we develop approximate algorithms based on the combination of different methods: dual formulation of the optimal transport problem, alternating optimization and random feature maps, to effectively replace exact WD computations in the RL tasks considered. We provide theoretical analysis of our algorithms and exhaustive empirical evaluation in a variety of RL settings.

optimization, optimization problem, survey article, (19 more...)

arXiv.org Machine Learning

1906.04349

Country:

Europe (0.67)
North America > United States > Colorado > Denver County > Denver (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Reinforcement Learning for Integer Programming: Learning to Cut

Tang, Yunhao, Agrawal, Shipra, Faenza, Yuri

arXiv.org Machine LearningJun-11-2019

Integer programming (IP) is a general optimization framework widely applicable to a variety of unstructured and structured problems arising in, e.g., scheduling, production planning, and graph optimization. As IP models many provably hard to solve problems, modern IP solvers rely on many heuristics. These heuristics are usually human-designed, and naturally prone to suboptimality. The goal of this work is to show that the performance of those solvers can be greatly enhanced using reinforcement learning (RL). In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method. This method is employed as a subroutine by all modern IP solvers. We present a deep RL formulation, network architecture, and algorithms for intelligent adaptive selection of cutting planes (aka cuts). Across a wide range of IP tasks, we show that the trained RL agent significantly outperforms human-designed heuristics, and effectively generalizes to 10X larger instances and across IP problem classes. The trained agent is also demonstrated to benefit the popular downstream application of cutting plane methods in Branch-and-Cut algorithm, which is the backbone of state-of-the-art commercial IP solvers.

constraint-based reasoning, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1906.04859

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.93)

Add feedback

Variance Reduction for Evolution Strategies via Structured Control Variates

Tang, Yunhao, Choromanski, Krzysztof, Kucukelbir, Alp

arXiv.org Machine LearningMay-29-2019

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance. We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.

estimator, neural network, optimization problem, (15 more...)

arXiv.org Machine Learning

1906.08868

Country: North America > United States > Alaska (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes

Choromanski, Krzysztof, Pacchiano, Aldo, Parker-Holder, Jack, Tang, Yunhao

arXiv.org Machine LearningMay-29-2019

We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes. We successfully apply DPPMCs to problems involving nonisotropic distributions arising in guided evolution strategy (GES) methods for RL, CMA-ES techniques and trust region algorithms for blackbox optimization, improving state-of-the-art in all these settings. In particular, we show that DPPMCs drastically improve exploration profiles of the existing evolution strategy algorithms. We further confirm our results, analyzing random feature map estimators for Gaussian mixture kernels. We provide theoretical justification of our empirical results, showing a connection between DPPMCs and structured orthogonal MC methods for isotropic distributions.

artificial intelligence, estimator, neural network, (18 more...)

arXiv.org Machine Learning

1905.12667

Country:

Europe (0.67)
Asia (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.69)

Add feedback

Orthogonal Estimation of Wasserstein Distances

Rowland, Mark, Hron, Jiri, Tang, Yunhao, Choromanski, Krzysztof, Sarlos, Tamas, Weller, Adrian

arXiv.org Machine LearningApr-5-2019

Wasserstein distances are increasingly used in a wide variety of applications in machine learning. Sliced Wasserstein distances form an important subclass which may be estimated efficiently through one-dimensional sorting operations. In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and draw connections with stratified sampling, and evaluate our approaches experimentally in a range of large-scale experiments in generative modelling and reinforcement learning.

artificial intelligence, neural network, wasserstein distance, (17 more...)

arXiv.org Machine Learning

1903.03784

Country: Asia > Japan (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback