exploration strategy with PAC guarantees (R-MAX, MBIE, etc.) can still be far from optimal in terms of exploration

Neural Information Processing Systems 

This is a clear example of exploration-then-exploitation behaviour with exactly one phase change in the process.