Online Learning in MDPs with Linear Function Approximation and Bandit Feedback
–Neural Information Processing Systems
Consequently, the state of the environment changes according to the transition function of the underlying MDP, as a function of the previous state and the action taken by the learner.
Neural Information Processing Systems
Aug-14-2025, 14:36:00 GMT
- Country:
- Europe > Spain
- Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East
- Jordan (0.04)
- Europe > Spain
- Industry:
- Education > Educational Setting > Online (0.42)