Goto

Collaborating Authors

 agent57


What Happened in Reinforcement Learning in 2022

#artificialintelligence

Just like how we learn from our environment and our actions determine whether we are rewarded or punished, so do reinforcement learning agents whose ultimate aim is to maximise the rewards. This article brings the top 8 reinforcement learning innovations that shaped AI across several industries in 2022. Alphabet's DeepMind collaborated with the University of Venice, the University of Oxford and the Athens University of Economics and Business to build a deep neural network called'Ithaca', which can restore missing text from ancient texts. In a paper published in Nature, DeepMind stated that Ithaca was trained using natural language processing (NLP) to not only recover lost ancient text that has been damaged over time but also identify the original location of the text and establish the date when it was made. With DeepMind's latest release AlphaTensor, an AI system (based on a 3D board game), researchers shed light on a 50-year-old fundamental mathematics question of finding the fastest way to multiply two matrices.


Human-level Atari 200x faster

arXiv.org Artificial Intelligence

The task of building general agents that perform well over a wide range of tasks has been an important goal in reinforcement learning since its inception. The problem has been subject of research of a large body of work, with performance frequently measured by observing scores over the wide range of environments contained in the Atari 57 benchmark. Agent57 was the first agent to surpass the human benchmark on all 57 games, but this came at the cost of poor data-efficiency, requiring nearly 80 billion frames of experience to achieve. Taking Agent57 as a starting point, we employ a diverse set of strategies to achieve a 200-fold reduction of experience needed to out perform the human baseline. We investigate a range of instabilities and bottlenecks we encountered while reducing the data regime, and propose effective solutions to build a more robust and efficient agent. We also demonstrate competitive performance with high-performing methods such as Muesli and MuZero. The four key components to our approach are (1) an approximate trust region method which enables stable bootstrapping from the online network, (2) a normalisation scheme for the loss and priorities which improves robustness when learning a set of value functions with a wide range of scales, (3) an improved architecture employing techniques from NFNets in order to leverage deeper networks without the need for normalization layers, and (4) a policy distillation method which serves to smooth out the instantaneous greedy policy overtime.


A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

arXiv.org Artificial Intelligence

The Arcade Learning Environment (ALE) is proposed as an evaluation platform for empirically assessing the generality of agents across dozens of Atari 2600 games. ALE offers various challenging problems and has drawn significant attention from the deep reinforcement learning (RL) community. From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE. However, is this the case? In this paper, to explore this problem, we first review the current evaluation metrics in the Atari benchmarks and then reveal that the current evaluation criteria of achieving superhuman performance are inappropriate, which underestimated the human performance relative to what is possible. To handle those problems and promote the development of RL research, we propose a novel Atari benchmark based on human world records (HWR), which puts forward higher requirements for RL agents on both final performance and learning efficiency. Furthermore, we summarize the state-of-the-art (SOTA) methods in Atari benchmarks and provide benchmark results over new evaluation metrics based on human world records. We concluded that at least four open challenges hinder RL agents from achieving superhuman performance from those new benchmark results. Finally, we also discuss some promising ways to handle those problems.


DeepMind's AI system finds its way around simulated cities it hasn't seen before

#artificialintelligence

DeepMind says it designed a system that can leverage prior knowledge to solve tasks, while at the same time exploring to gather new knowledge and plan using this new knowledge when faced with new tasks. In a paper accepted to the Conference on Computer Vision and Pattern Recognition (CVPR) 2020, researchers at the company describe an AI "planning module" that operates over episodic memories (memories of everyday events that can be explicitly stated), which they say outperforms the nearest baseline by two to three times with respect to planning and exploring. A grand challenge in AI is architecting a model that's able to enter unfamiliar environments and get to work immediately. For example, the paragon household robot would use general knowledge about homes to find cleaning supplies and acquire information it anticipates will be useful, like the location of clothes hampers in the rooms it passes. It could then leverage the newfound knowledge (i.e., hamper locations) to plan solutions for future tasks (e.g., doing the laundry) that solve the tasks more quickly.


Alphabet's DeepMind AI is better than you at Atari games

#artificialintelligence

According to MIT's Technology Review, Pitfall and Montezuma's Revenge require the AI to experiment more than usual in order to figure out how to get a better score. Meanwhile, Solaris and Skiing are difficult for the AI because there aren't as many indications as success -- the AI doesn't know if it's making the right moves for long stretches of time. DeepMind built upon its older AI agents so that Agent57 could make better decisions regarding exploration and score exploitation, as well as to optimize the trade-off between short-term and long term performance in games like Skiing. Technology Review notes that while these results are impressive, AI still has a long way to go. These systems can only figure out one game at a time, which it says is at odds with the skills of humans: "True versatility, which comes so easily to a human infant, is still far beyond AIs' reach."


Last Week in AI

#artificialintelligence

Every week, Invector Labs publishes a newsletter that covers the most recent developments in AI research and technology. You can find this week's issue below. You can sign up for it below. Games are often seen as a great benchmark to evaluate the ability of artificial intelligence(AI) algorithms to generalize knowledge. From the different data environments that we can create, games come the closest to resemble real world environments.


DeepMind's Agent57 beats humans at 57 classic Atari games

#artificialintelligence

In a preprint paper published this week by DeepMind, Google parent company Alphabet's U.K.-based research division, a team of scientists describe Agent57, which they say is the first system that outperforms humans on all 57 Atari games in the Arcade Learning Environment data set. Assuming the claim holds water, Agent57 could lay the groundwork for more capable AI decision-making models than have been previously released. This could be a boon for enterprises looking to boost productivity through workplace automation; imagine AI that automatically completes not only mundane, repetitive tasks like data entry, but which reasons about its environment. "With Agent57, we have succeeded in building a more generally intelligent agent that has above-human performance on all tasks in the Atari57 benchmark," wrote the study's coauthors. "Agent57 was able to scale with increasing amounts of computation: the longer it trained, the higher its score got."


Agent57: Outperforming the Human Atari Benchmark

#artificialintelligence

Interfacing memory with behaviour is crucial for building systems that self-learn. In reinforcement learning, an agent can be an on-policy learner, which can only learn the value of its direct actions, or an off-policy learner, which can learn about optimal actions even when not performing those actions – e.g., it might be taking random actions, but can still learn what the best possible action would be. Off-policy learning is therefore a desirable property for agents, helping them learn the best course of action to take while thoroughly exploring their environment. Combining off-policy learning with memory is challenging because you need to know what you might remember when executing a different behaviour. For example, what you might choose to remember when looking for an apple (e.g., where the apple is located), is different to what you might choose to remember if looking for an orange. But if you were looking for an orange, you could still learn how to find the apple if you came across the apple by chance, in case you need to find it in the future.


Agent57: Outperforming the Atari Human Benchmark

arXiv.org Machine Learning

Atari games have been a long-standing benchmark in the reinforcement learning (RL) community for the past decade. This benchmark was proposed to test general competency of RL algorithms. Previous work has achieved good average performance by doing outstandingly well on many games of the set, but very poorly in several of the most challenging games. We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games. To achieve this result, we train a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative. We propose an adaptive mechanism to choose which policy to prioritize throughout the training process. Additionally, we utilize a novel parameterization of the architecture that allows for more consistent and stable learning.