atari 2600
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Unifying Count-Based Exploration and Intrinsic Motivation
Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across states. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, Pieter Abbeel
These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Belgium > Flanders (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
This paper studies a number of variations on the topic of training a deep network using data generated by a Monte-Carlo Tree Search (MCTS) agent. The paper focuses on the Atari 2600 platform and is motivated by the observation that, while MCTS performs extremely well on Atari 2600 games, it is also too computationally expensive to be used in a realistic setting. The authors provide empirical results on a number of Atari 2600 games.
Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic Environments
Verma, Abhishek, V, Nallarasan, Ravindran, Balaraman
Deep Reinforcement Learning (DRL) has achieved remarkable success in complex sequential decision-making tasks, such as playing Atari 2600 games and mastering board games. A critical yet underexplored aspect of DRL is the temporal scale of action execution. We propose a novel paradigm that integrates contextual bandits with DRL to adaptively select action durations, enhancing policy flexibility and computational efficiency. Our approach augments a Deep Q-Network (DQN) with a contextual bandit module that learns to choose optimal action repetition rates based on state contexts. Experiments on Atari 2600 games demonstrate significant performance improvements over static duration baselines, highlighting the efficacy of adaptive temporal abstractions in DRL. This paradigm offers a scalable solution for real-time applications like gaming and robotics, where dynamic action durations are critical.
ChatGPT gets 'wrecked' by a simple 1977 Atari chess program
Despite ever-growing interest in AI tools and assistants, it's worth remembering that they're still quite limited with numerous shortcomings. They are not as smart as they might seem on the surface. Case in point, ChatGPT is pretty useless when it comes to playing chess. As reported by Futurism, ChatGPT lost a chess game against the classic Atari 2600 gaming console. Robert Caruso, an engineer at Citrix, organised the game between the AI and a simple 1977 chess program released for the Atari 2600.