Online Learning in MDPs with Linear Function Approximation and Bandit Feedback

Aug-14-2025, 14:36:00 GMT–Neural Information Processing Systems

Consequently, the state of the environment changes according to the transition function of the underlying MDP, as a function of the previous state and the action taken by the learner.

algorithm, mdp, optimization problem, (13 more...)

Neural Information Processing Systems

Aug-14-2025, 14:36:00 GMT

Conferences PDF

Country:
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East
  - Jordan (0.04)

Industry:
- Education > Educational Setting > Online (0.42)

Technology:
- Information Technology
  - Enterprise Applications > Human Resources
    - Learning Management (0.42)
  - Artificial Intelligence
    - Machine Learning > Reinforcement Learning (0.47)
    - Representation & Reasoning
      - Optimization (0.69)
      - Uncertainty > Fuzzy Logic (0.41)

Duplicate Docs Excel Report

Title
5631e6ee59a4175cd06c305840562ff3-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found