AITopics | hard exploration game

Playing hard exploration games by watching YouTube

Neural Information Processing SystemsNov-20-2025, 22:02:25 GMT

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e.

hard exploration game, name change, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

Review for NeurIPS paper: Learning Guidance Rewards with Trajectory-space Smoothing

Neural Information Processing SystemsJan-21-2025, 08:50:56 GMT

Weaknesses: While the experimental results are promising in the episodic Mujoco tasks, it is unclear whether IRCR works in a more complicated setting (e.g., hard exploration games in Atari, sparse reward tasks, robotic arm pick-and-place tasks, etc). Specifically, in the Mujoco tasks considered in the paper, the episodic reward can be obtained by performing simple, repetitive patterns (i.e., keep going forward). But often, the solution to an environment may involve action sequences that are hardly overlapped or repetitive (e.g., a robotic agent that needs to pick up a block and place it in a specified location, a maze-exploring agent that needs to pick up a key to open the exit). In such cases, the algorithm needs to correctly associate the non-repetitive events with the final return to solve the problem. I wonder if IRCR works in such cases as well.

learning guidance reward, neurips paper, trajectory-space smoothing, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Reviews: Playing hard exploration games by watching YouTube

Neural Information Processing SystemsOct-7-2024, 08:16:25 GMT

While usually supervision has to be intentionally provided by a human, the authors instead use YouTube videos as a form of supervision. They first align videos to a shared representation, using two concept that do not require any labeling: i) predicting the temporal distance between frames in the same video, and ii) predicting the temporal distance between a video and audio frame of the same video. Subsequently, they use the embedding on a novel video to generate checkpoints, which serve a intermediate rewards for an RL agent. Experiments show state-of-the-art performance amongst learning from demonstration approaches. Strength: - The paper is clearly written, with logical steps between the sections, and good motivations.

demonstration, supervision, video, (12 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.54)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Playing hard exploration games by watching YouTube

Aytar, Yusuf, Pfaff, Tobias, Budden, David, Paine, Thomas, Wang, Ziyu, Freitas, Nando de

Neural Information Processing SystemsFeb-14-2020, 11:56:15 GMT

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Never Give Up: Learning Directed Exploration Strategies

Badia, Adrià Puigdomènech, Sprechmann, Pablo, Vitvitskyi, Alex, Guo, Daniel, Piot, Bilal, Kapturowski, Steven, Tieleman, Olivier, Arjovsky, Martín, Pritzel, Alexander, Bolt, Andew, Blundell, Charles

arXiv.org Machine LearningFeb-14-2020

We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control. We employ the framework of Universal Value Function Approximators (UVFA) to simultaneously learn many directed exploration policies with the same neural network, with different trade-offs between exploration and exploitation. By using the same neural network for different degrees of exploration/exploitation, transfer is demonstrated from predominantly exploratory policies yielding effective exploitative policies. The proposed method can be incorporated to run with modern distributed RL agents that collect large amounts of experience from many actors running in parallel on separate environment instances. Our method doubles the performance of the base agent in all hard exploration in the Atari-57 suite while maintaining a very high score across the remaining games, obtaining a median human normalised score of 1344.0%. Notably, the proposed method is the first algorithm to achieve non-zero rewards (with a mean score of 8,400) in the game of Pitfall! without using demonstrations or hand-crafted features.

agent, computer game, upstream oil & gas, (21 more...)

arXiv.org Machine Learning

2002.06038

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

Taïga, Adrien Ali, Fedus, William, Machado, Marlos C., Courville, Aaron, Bellemare, Marc G.

arXiv.org Machine LearningAug-6-2019

This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the state-of-the-art algorithm for value-based agents, and focus on some of the bonuses proposed in the last few years. We consider the impact these algorithms have on performance within the popular game Montezuma's Revenge which has gathered a lot of interest from the exploration community, across the the set of seven games identified by Bellemare et al. (2016) as challenging for exploration, and easier games where exploration is not an issue. We find that, in our setting, recently developed bonuses do not provide significantly improved performance on Montezuma's Revenge or hard exploration games. We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than $\epsilon$-greedy exploration.

benchmarking bonus-based exploration method, exploration, ontezuma, (10 more...)

arXiv.org Machine Learning

1908.02388

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.83)

Industry: Education (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

Playing hard exploration games by watching YouTube

Aytar, Yusuf, Pfaff, Tobias, Budden, David, Paine, Tom Le, Wang, Ziyu, de Freitas, Nando

arXiv.org Artificial IntelligenceMay-29-2018

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1805.11592

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback