Truncated Variance Reduced Value Iteration
–Neural Information Processing Systems
We provide faster randomized algorithms for computing an $\epsilon$-optimal policy in a discounted Markov decision process with $A_{\text{tot}}$-state-action pairs, bounded rewards, and discount factor $\gamma$.
Neural Information Processing Systems
Dec-27-2025, 08:52:45 GMT
- Technology: