AITopics | algorithm and tighter regret bound

Collaborating Authors

algorithm and tighter regret bound

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

Neural Information Processing SystemsDec-24-2025, 16:41:46 GMT

We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs). We propose two near-optimal and oracle-efficient algorithms for FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian and a frequentist regret bound respectively, both of which reduce to the near-optimal bound $O(DS\sqrt{AT})$ for standard non-factored MDPs. We propose a tighter connectivity measure, factored span, for FMDPs and prove a lower bound that depends on the factored span rather than the diameter $D$. In order to decrease the gap between lower and upper bounds, we propose an adaptation of the REGAL.C algorithm whose regret bound depends on the factored span. Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.

algorithm and tighter regret bound, oracle-efficient algorithm, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Review for NeurIPS paper: Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

Neural Information Processing SystemsFeb-6-2025, 16:23:02 GMT

Weaknesses: I have some concerns and questions: - In order to come up with an efficiently-implementable algorithm, for DORL the authors construct an optimistic MDP following a very simple construction. This construction only considers error bounds and completely ignores the value function. So, while the proof claims the optimism is guaranteed, I believe that the resulting optimistic MDP is overly-optimsitic, and to favor computational efficiency, this way one may sacrifice learning efficiency to a large extent. Indeed, the idea of optimizing over a finite set of MDPs (in lieu of the bounded-parameter MDP) is nice. However, I believe the current construction that completely ignores the value function is too naive to work in practice.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

Neural Information Processing SystemsFeb-6-2025, 16:22:55 GMT

After discussing with the reviewers, we have decided to propose acceptance for the paper. Nonetheless, I would like the stress that the current submission has a number of critical aspects that need to be addressed by the authors to make the paper more solid. I strongly encourage the authors to read reviewers' comments and focus on improving along the following directions: - Clarity: Reviewers all agree that the paper could be improved in writing to make it more accessible to an audience that is not strictly familiar with the factored MDP formalism and/or the technicalities behind UCRL proofs. In particular, it is unclear how the parameter c has been chosen and why it takes significantly different values for different algorithms. It would be helpful to see the performance as c changes. - The way optimism is obtained is probably not very tight and it may cause over exploration for a long time.

algorithm and tighter regret bound, oracle-efficient algorithm, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

Neural Information Processing SystemsOct-11-2024, 10:58:52 GMT

We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs). We propose two near-optimal and oracle-efficient algorithms for FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian and a frequentist regret bound respectively, both of which reduce to the near-optimal bound O(DS\sqrt{AT}) for standard non-factored MDPs. We propose a tighter connectivity measure, factored span, for FMDPs and prove a lower bound that depends on the factored span rather than the diameter D . In order to decrease the gap between lower and upper bounds, we propose an adaptation of the REGAL.C algorithm whose regret bound depends on the factored span.

algorithm and tighter regret bound, oracle-efficient algorithm, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback