OfflineReinforcementLearningasOneBig SequenceModelingProblem
–Neural Information Processing Systems
Reinforcement learning (RL) is typically concerned with estimating stationary policies orsingle-step models, leveraging theMarkovproperty tofactorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence ofhighrewards.
Neural Information Processing Systems
Feb-7-2026, 10:15:10 GMT
- Technology: