A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Yang, Yunchang, Zhong, Han, Wu, Tianhao, Liu, Bin, Wang, Liwei, Du, Simon S.

Dec-24-2023–arXiv.org Artificial Intelligence

We study stochastic delayed feedback in general sequential decision-making problems, which include bandits, single-agent Markov decision processes (MDPs), and Markov games (MGs). We propose a novel reduction-based framework, which turns any multi-batched algorithm for sequential decision making with instantaneous feedback into a sample-efficient algorithm that can handle stochastic delays in sequential decision-making problems. By plugging different multi-batched algorithms into our framework, we provide several examples demonstrating that our framework not only matches or improves existing results for bandits, tabular MDPs, and tabular MGs, but also provides the first line of studies on delays in sequential decision making with function approximation. In summary, we provide a complete set of sharp results for single-agent and multi-agent sequential decision-making problems with delayed feedback.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Dec-24-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report (0.64)
- Workflow (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)