Truncated Variance Reduced Value Iteration
–Neural Information Processing Systems
We provide faster randomized algorithms for computing an $\epsilon$-optimal policy in a discounted Markov decision process with $A_{\text{tot}}$-state-action pairs, bounded rewards, and discount factor $\gamma$.
Neural Information Processing Systems
Mar-22-2026, 15:30:32 GMT
- Technology: