AITopics | horizon-free offline reinforcement learning

Nearly Horizon-Free Offline Reinforcement Learning

Neural Information Processing SystemsFeb-9-2026, 15:06:35 GMT

A (potentially is =( 1, 2, H), where h : S ! ItholdsVh(s)depends ˆP(s0|s, a), ho S factor.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Nearly Horizon-Free Offline Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 09:37:25 GMT

We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes (MDP). For tabular MDP with $S$ states and $A$ actions, or linear MDP with anchor points and feature dimension $d$, given the collected $K$ episodes data with minimum visiting probability of (anchor) state-action pairs $d_m$, we obtain nearly horizon $H$-free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by 1. Specifically: For offline policy evaluation, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Kd_m}} \right)$ error bound for the plug-in estimator, which matches the lower bound up to logarithmic factors and does not have additional dependency on $\mathrm{poly}(H, S, A, d)$ in higher-order term. For offline policy optimization, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Kd_m}} + \frac{\min(S, d)}{Kd_m}\right)$ sub-optimality gap for the empirical optimal policy, which approaches the lower bound up to logarithmic factors and a high-order term, improving upon the best known result by [Cui and Yang 2020] that has additional $\mathrm{poly} (H, S, d)$ factors in the main term.To the best of our knowledge, these are the first set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points. Central to our analysis is a simple yet effective recursion based method to bound a total variance term in the offline scenarios, which could be of individual interest.

electronic proceedings, horizon-free offline reinforcement learning, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Add feedback

Nearly Horizon-Free Offline Reinforcement Learning

Neural Information Processing SystemsMay-26-2025, 22:54:40 GMT

We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes (MDP). For tabular MDP with S states and A actions, or linear MDP with anchor points and feature dimension d, given the collected K episodes data with minimum visiting probability of (anchor) state-action pairs d_m, we obtain nearly horizon H -free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by 1. Specifically:• For offline policy evaluation, we obtain an \tilde{O}\left(\sqrt{\frac{1}{Kd_m}} \right) error bound for the plug-in estimator, which matches the lower bound up to logarithmic factors and does not have additional dependency on \mathrm{poly}(H, S, A, d) in higher-order term.• For offline policy optimization, we obtain an \tilde{O}\left(\sqrt{\frac{1}{Kd_m}} \frac{\min(S, d)}{Kd_m}\right) sub-optimality gap for the empirical optimal policy, which approaches the lower bound up to logarithmic factors and a high-order term, improving upon the best known result by [Cui and Yang 2020] that has additional \mathrm{poly} (H, S, d) factors in the main term.To the best of our knowledge, these are the first set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points. Central to our analysis is a simple yet effective recursion based method to bound a "total variance" term in the offline scenarios, which could be of individual interest.

artificial intelligence, horizon-free offline reinforcement learning, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Nearly Horizon-Free Offline Reinforcement Learning

Neural Information Processing SystemsOct-11-2024, 13:35:12 GMT

We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes (MDP). For tabular MDP with S states and A actions, or linear MDP with anchor points and feature dimension d, given the collected K episodes data with minimum visiting probability of (anchor) state-action pairs d_m, we obtain nearly horizon H -free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by 1. Specifically:• For offline policy evaluation, we obtain an \tilde{O}\left(\sqrt{\frac{1}{Kd_m}} \right) error bound for the plug-in estimator, which matches the lower bound up to logarithmic factors and does not have additional dependency on \mathrm{poly}(H, S, A, d) in higher-order term.• For offline policy optimization, we obtain an \tilde{O}\left(\sqrt{\frac{1}{Kd_m}} \frac{\min(S, d)}{Kd_m}\right) sub-optimality gap for the empirical optimal policy, which approaches the lower bound up to logarithmic factors and a high-order term, improving upon the best known result by [Cui and Yang 2020] that has additional \mathrm{poly} (H, S, d) factors in the main term.To the best of our knowledge, these are the first set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points. Central to our analysis is a simple yet effective recursion based method to bound a "total variance" term in the offline scenarios, which could be of individual interest.

anchor point, horizon-free offline reinforcement learning, tabular mdp, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Nearly Horizon-Free Offline Reinforcement Learning

Ren, Tongzheng, Li, Jialian, Dai, Bo, Du, Simon S., Sanghavi, Sujay

arXiv.org Machine LearningMar-25-2021

We revisit offline reinforcement learning on episodic time-homogeneous tabular Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$. Given the collected $N$ episodes data with minimum cumulative reaching probability $d_m$, we obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs: 1.For the offline evaluation, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Nd_m}} \right)$ error rate, which matches the lower bound and does not have additional dependency on $\poly\left(S,A\right)$ in higher-order term, that is different from previous works~\citep{yin2020near,yin2020asymptotically}. 2.For the offline policy optimization, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Nd_m}} + \frac{S}{Nd_m}\right)$ error rate, improving upon the best known result by \cite{cui2020plug}, which has additional $H$ and $S$ factors in the main term. Furthermore, this bound approaches the $\Omega\left(\sqrt{\frac{1}{Nd_m}}\right)$ lower bound up to logarithmic factors and a high-order term. To the best of our knowledge, these are the first set of nearly horizon-free bounds in offline reinforcement learning.

horizon-free offline reinforcement learning

arXiv.org Machine Learning

2103.14077

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

Filters

Collaborating Authors

horizon-free offline reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Nearly Horizon-Free Offline Reinforcement Learning

Nearly Horizon-Free Offline Reinforcement Learning

Nearly Horizon-Free Offline Reinforcement Learning

Nearly Horizon-Free Offline Reinforcement Learning

Nearly Horizon-Free Offline Reinforcement Learning