exploration strategy with PAC guarantees (R-MAX, MBIE, etc.) can still be far from optimal in terms of exploration
–Neural Information Processing Systems
This is a clear example of exploration-then-exploitation behaviour with exactly one phase change in the process.
Neural Information Processing Systems
May-23-2025, 16:02:18 GMT
- Technology: