Agent57: Outperforming the Human Atari Benchmark

#artificialintelligence 

Interfacing memory with behaviour is crucial for building systems that self-learn. In reinforcement learning, an agent can be an on-policy learner, which can only learn the value of its direct actions, or an off-policy learner, which can learn about optimal actions even when not performing those actions – e.g., it might be taking random actions, but can still learn what the best possible action would be. Off-policy learning is therefore a desirable property for agents, helping them learn the best course of action to take while thoroughly exploring their environment. Combining off-policy learning with memory is challenging because you need to know what you might remember when executing a different behaviour. For example, what you might choose to remember when looking for an apple (e.g., where the apple is located), is different to what you might choose to remember if looking for an orange. But if you were looking for an orange, you could still learn how to find the apple if you came across the apple by chance, in case you need to find it in the future.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found