Multi-Reward Best Policy Identification
–Neural Information Processing Systems
This bound guides the design of an optimal exploration policy attaining minimal sample complexity. However, this lower bound involves solving a hard non-convex optimization problem.
Neural Information Processing Systems
Feb-17-2026, 21:23:02 GMT
- Country:
- Europe
- Netherlands > North Holland
- Amsterdam (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Netherlands > North Holland
- North America > United States
- Texas > Travis County > Austin (0.04)
- Europe
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
- Technology: