AMore Discussion
–Neural Information Processing Systems
Why One-step and IQL are imitation-based methods? The core difference between RL-based and imitation-based methods is that RL-based methods learn a value function of policy π while imitation-based methods don't. Learning the value function of π requires off-policy evaluation of π (i.e., learning Qπ or Vπ), which is prone to distribution shift. The policy evaluation and policy improvement will also affect each other as they are coupled. Imitation-based methods don't learn Qπ or Vπ, but some of them do learn a value function.
Neural Information Processing Systems
Apr-24-2026, 21:47:06 GMT
- Technology: