Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds Jiayi Huang 1 Han Zhong 1 Liwei Wang
–Neural Information Processing Systems
Neural Information Processing Systems
Feb-16-2026, 14:39:29 GMT
- Country:
- Asia
- China (0.04)
- Middle East > Jordan (0.04)
- North America > United States
- California > Los Angeles County > Los Angeles (0.14)
- South America > Chile
- Asia
- Technology: