World Models

#artificialintelligence 

This weakness could be the reason that many previous works that learn dynamics models of RL environments but don't actually use those models to fully replace the actual environments . Like in the M model proposed in, the dynamics model is a deterministic differentiable model, making the model easily exploitable by the agent if it is not perfect. Using Bayesian models, as in PILCO, helps to address this issue with the uncertainty estimates to some extent, however, they do not fully solve the problem. Recent work combines the model-based approach with traditional model-free RL training by first initializing the policy network with the learned policy, but must subsequently rely on a model-free method to fine-tune this policy in the actual environment.In Learning to Think, it is acceptable that the RNN M isn't always a reliable predictor. A (potentially evolution-based) RNN C can in principle learn to ignore a flawed M, or exploit certain useful parts of M for arbitrary computational purposes including hierarchical planning etc.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found