Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Mar-21-2025, 16:06:32 GMT–Neural Information Processing Systems

Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations of approximate policy iteration (PI), i.e., where policy improvement and policy evaluation are both performed approximately. In applications where the average reward objective is the meaningful performance metric, discounted reward formulations are often used with the discount factor being close to 1, which is equivalent to making the expected horizon very large.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Mar-21-2025, 16:06:32 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Illinois > Champaign County > Urbana (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)