atari
LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari.However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity.In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios.
Unsupervised State Representation Learning in Atari
State representation learning, or the ability to capture latent generative factors of an environment is crucial for building intelligent agents that can perform a wide variety of tasks. Learning such representations in an unsupervised manner without supervision from rewards is an open problem. We introduce a method that tries to learn better state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations. We also introduce a new benchmark based on Atari 2600 games where we evaluate representations based on how well they capture the ground truth state. We believe this new framework for evaluating representation learning models will be crucial for future representation learning research. Finally, we compare our technique with other state-of-the-art generative and contrastive representation learning methods.
Reward learning from human preferences and demonstrations in Atari
To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we need humans to communicate an objective to the agent directly. In this work, we combine two approaches to this problem: learning from expert demonstrations and learning from trajectory preferences. We use both to train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games. Additionally, we investigate the fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.
- Leisure & Entertainment > Games (1.00)
- Leisure & Entertainment > Sports (0.68)
The Full Nerd: PCIe 6.0 inbound, ChatGPT rekt by Atari, & Alienware Lego-fied
Welcome to The Full Nerd newsletter--your weekly dose of hardcore hardware talk from the enthusiasts at PCWorld. In it, we dive into the hottest topics from our YouTube show, plus interesting news from across the web. Attending the Nintendo Switch 2 launch at our local Nintendo Store felled both Adam and Will, delaying our usual Tuesday episode. But don't worry: I still have plenty of juicy news bits to share with you below. Also our Micro Center tour videos are live!
- Information Technology > Communications > Social Media (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)