OfflineReinforcementLearningasOneBig SequenceModelingProblem

Neural Information Processing Systems 

Reinforcement learning (RL) is typically concerned with estimating stationary policies orsingle-step models, leveraging theMarkovproperty tofactorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence ofhighrewards.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found