Multi-armedBanditRequiringMonotoneArm Sequences
–Neural Information Processing Systems
Popular algorithms suchasUCB[4,5]andThompson sampling [3,34]typically explorethearms sufficiently and as more evidence is gathered, converge to the optimal arm.
Neural Information Processing Systems
Feb-9-2026, 16:46:45 GMT