Sample-efficient AI

#artificialintelligence 

Since AlphaGo, AI researchers have recognized the promise of integrating reinforcement learning with search methods, which involve considering many potential next actions available to an RL agent, and simulating what their results might be before choosing one. This starts to mimic human deliberation much more closely, by explicitly introducing elements of "planning" into the RL paradigm. Yang attributes the huge performance improvements of AlphaGo, AlphaZero and MuZero to this search process. Another important distinction in RL is between model-based systems, which construct explicit models of their environments, and model-free systems, which don't. Prior to AlphaGo, just about all leading RL work was done on model-free systems (PPO and deep Q learning, for example). Model-based systems just weren't practical because the learning environment models is hard, and adds a significant layer of complexity on top of the simpler action selection task that model-free systems could focus on exclusively.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found