AITopics | Wang, Ziyu

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Paine, Tom Le, Gulcehre, Caglar, Shahriari, Bobak, Denil, Misha, Hoffman, Matt, Soyer, Hubert, Tanburn, Richard, Kapturowski, Steven, Rabinowitz, Neil, Williams, Duncan, Barth-Maron, Gabriel, Wang, Ziyu, de Freitas, Nando, Team, Worlds

arXiv.org Artificial IntelligenceSep-3-2019

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fail to see even a single successful trajectory after tens of billions of steps of exploration.

deep learning, demonstration, neural network, (18 more...)

arXiv.org Artificial Intelligence

1909.01387

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Deep Music Analogy Via Latent Representation Disentanglement

Yang, Ruihan, Wang, Dingsu, Wang, Ziyu, Chen, Tianyao, Jiang, Junyan, Xia, Gus

arXiv.org Machine LearningJun-9-2019

Analogy is a key solution to automated music generation, featured by its ability to generate both natural and creative pieces based on only a few examples. In general, an analogy is made by partially transferring the music abstractions, i.e., high-level representations and their relationships, from one piece to another; however, this procedure requires disentangling music representations, which takes little effort for musicians but is non-trivial for computers. Three sub-problems arise: extracting latent representations from the observation, disentangling the representations so that each part has a unique semantic interpretation, and mapping the latent representations back to actual music. An explicitly-constrained conditional variational auto-encoder (EC2-VAE) is proposed as a unified solution to all three sub-problems. In this study, we focus on disentangling the pitch and rhythm representations of 8-beat music clips conditioned on chords. In producing music analogies, this model helps us to realize the imaginary situation of "what if" a piece is composed using a different pitch contour, rhythm pattern, chord progression etc., by borrowing the representations from other pieces. Finally, we validate the proposed disentanglement method using objective measurements and evaluate the analogy examples by a subjective study.

deep learning, neural network, representation, (19 more...)

arXiv.org Machine Learning

1906.03626

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Function Space Particle Optimization for Bayesian Neural Networks

Wang, Ziyu, Ren, Tongzheng, Zhu, Jun, Zhang, Bo

arXiv.org Machine LearningFeb-26-2019

While Bayesian neural networks (BNNs) have drawn increasing attention, their posterior inference remains challenging, due to the high-dimensional and over-parameterized nature. To address this issue, several highly flexible and scalable variational inference procedures based on the idea of particle optimization have been proposed. These methods directly optimize a set of particles to approximate the target posterior. However, their application to BNNs often yields sub-optimal performance, as such methods have a particular failure mode on over-parameterized models. In this paper, we propose to solve this issue by performing particle optimization directly in the space of regression functions. We demonstrate through extensive experiments that our method successfully overcomes this issue, and outperforms strong baselines in a variety of tasks including prediction, defense against adversarial examples, and reinforcement learning.

bayesian inference, neural network, particle, (18 more...)

arXiv.org Machine Learning

1902.09754

Country:

Europe > Sweden (0.14)
Asia > China (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Oil & Gas (0.46)

Add feedback

Playing hard exploration games by watching YouTube

Aytar, Yusuf, Pfaff, Tobias, Budden, David, Paine, Thomas, Wang, Ziyu, Freitas, Nando de

Neural Information Processing SystemsDec-31-2018

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to learn a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma’s Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

artificial intelligence, arxiv preprint arxiv, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Playing hard exploration games by watching YouTube

Aytar, Yusuf, Pfaff, Tobias, Budden, David, Paine, Thomas, Wang, Ziyu, Freitas, Nando de

Neural Information Processing SystemsDec-31-2018

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to learn a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma’s Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

artificial intelligence, arxiv preprint arxiv, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Framework for Automated Pop-song Melody Generation with Piano Accompaniment Arrangement

Wang, Ziyu, Xia, Gus

arXiv.org Machine LearningDec-28-2018

We contribute a pop-song automation framework for lead melody generation and accompaniment arrangement. The framework reflects the major procedures of human music composition, generating both lead melody and piano accompaniment by a unified strategy. Specifically, we take chord progression as an input and propose three models to generate a structured melody with piano accompaniment textures. First, the harmony alternation model transforms a raw input chord progression to an altered one to better fit the specified music style. Second, the melody generation model generates the lead melody and other voices (melody lines) of the accompaniment using seasonal ARMA (Autoregressive Moving Average) processes. Third, the melody integration model integrates melody lines (voices) together as the final piano accompaniment. We evaluate the proposed framework using subjective listening tests. Experimental results show that the generated melodies are rated significantly higher than the ones generated by bi-directional LSTM, and our accompaniment arrangement result is comparable with a state-of-the-art commercial software, Band in a Box.

deep learning, melody, neural network, (19 more...)

arXiv.org Machine Learning

1812.10906

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Optimization in AlphaGo

Chen, Yutian, Huang, Aja, Wang, Ziyu, Antonoglou, Ioannis, Schrittwieser, Julian, Silver, David, de Freitas, Nando

arXiv.org Artificial IntelligenceDec-17-2018

During the development of AlphaGo, its many hyper-parameters were tuned with Bayesian optimization multiple times. This automatic tuning process resulted in substantial improvements in playing strength. For example, prior to the match with Lee Sedol, we tuned the latest AlphaGo agent and this improved its win-rate from 50% to 66.5% in self-play games. This tuned version was deployed in the final match. Of course, since we tuned AlphaGo many times during its development cycle, the compounded contribution was even higher than this percentage. It is our hope that this brief case study will be of interest to Go fans, and also provide Bayesian optimization practitioners with some insights and inspiration.

artificial intelligence, bayesian optimization, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1812.06855

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Go (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Games > Go (1.00)

Add feedback

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Paine, Tom Le, Colmenarejo, Sergio Gómez, Wang, Ziyu, Reed, Scott, Aytar, Yusuf, Pfaff, Tobias, Hoffman, Matt W., Barth-Maron, Gabriel, Cabi, Serkan, Budden, David, de Freitas, Nando

arXiv.org Artificial IntelligenceOct-11-2018

Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.

deep learning, demonstration, neural network, (18 more...)

arXiv.org Artificial Intelligence

1810.05017

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Playing hard exploration games by watching YouTube

Aytar, Yusuf, Pfaff, Tobias, Budden, David, Paine, Tom Le, Wang, Ziyu, de Freitas, Nando

arXiv.org Artificial IntelligenceMay-29-2018

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

artificial intelligence, arxiv preprint arxiv, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1805.11592

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Zhu, Yuke, Wang, Ziyu, Merel, Josh, Rusu, Andrei, Erez, Tom, Cabi, Serkan, Tunyasuvunakool, Saran, Kramár, János, Hadsell, Raia, de Freitas, Nando, Heess, Nicolas

arXiv.org Artificial IntelligenceFeb-26-2018

We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. Our experiments indicate that our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0

computer game, deep learning, demonstration, (21 more...)

arXiv.org Artificial Intelligence

1802.09564

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology: