sgb
Does Stochastic Gradient really succeed for bandits?
Recent works of Mei et al. (2023, 2024) have deepened the theoretical understanding of the *Stochastic Gradient Bandit* (SGB) policy, showing that using a constant learning rate guarantees asymptotic convergence to the optimal policy, and that sufficiently *small* learning rates can yield logarithmic regret. However, whether logarithmic regret holds beyond small learning rates remains unclear. In this work, we take a step towards characterizing the regret *regimes* of SGB as a function of its learning rate. For two--armed bandits, we identify a sharp threshold, scaling with the sub-optimality gap $\Delta$, below which SGB achieves *logarithmic* regret on all instances, and above which it can incur *polynomial* regret on some instances. This result highlights the necessity of knowing (or estimating) $\Delta$ to ensure logarithmic regret with a constant learning rate. For general $K$-armed bandits, we further show the learning rate must scale inversely with $K$ to avoid polynomial regret. We introduce novel techniques to derive regret upper bounds for SGB, laying the groundwork for future advances in the theory of gradient-based bandit algorithms.
Appendix APerformanceonreal-worldbasedinstances
We further evaluate SGBS+EAS on nine real-world based instance sets from [15]. Each instance set consists of 20 instances that have similar characteristics (i.e., they have been sampled from the same underlying distribution). To account for this new evaluation setting, we always perform 10 runs in parallel for EAS and SGBS+EAS. This improves the solution quality, while leading only to a slight increase of the requiredruntime. For SGBS+EAS we set (ฮฒ, ฮณ) = (35,5), the learning rate ฮฑ = 0.005 and ฮป = 0.05.
Simulation-guidedBeamSearch forNeuralCombinatorialOptimization
Neural approaches for combinatorial optimization (CO) equip a learning mechanism to discover powerful heuristics for solving complex real-world problems. While neural approaches capable of high-quality solutions in a single shot are emerging, state-of-the-art approaches are often unable to take full advantage of the solving time available to them. In contrast, hand-crafted heuristics perform highly effective search well and exploit the computation time given to them, but contain heuristics that are difficult to adapt to a dataset being solved.
Simulation-guided Beam Search for Neural Combinatorial Optimization
Neural approaches for combinatorial optimization (CO) equip a learning mechanism to discover powerful heuristics for solving complex real-world problems. While neural approaches capable of high-quality solutions in a single shot are emerging, state-of-the-art approaches are often unable to take full advantage of the solving time available to them. In contrast, hand-crafted heuristics perform highly effective search well and exploit the computation time given to them, but contain heuristics that are difficult to adapt to a dataset being solved. With the goal of providing a powerful search procedure to neural CO approaches, we propose simulation-guided beam search (SGBS), which examines candidate solutions within a fixed-width tree search that both a neural net-learned policy and a simulation (rollout) identify as promising. We further hybridize SGBS with efficient active search (EAS), where SGBS enhances the quality of solutions backpropagated in EAS, and EAS improves the quality of the policy used in SGBS. We evaluate our methods on well-known CO benchmarks and show that SGBS significantly improves the quality of the solutions found under reasonable runtime assumptions.
Appendix A Performance on real-world based instances
We further evaluate SGBS+EAS on nine real-world based instance sets from [15]. Each instance set consists of 20 instances that have similar characteristics (i.e., they have been sampled from the same underlying distribution). The instance sets differ significantly in terms of several structural properties, for example, the number of customers n and their position (e.g., clustered vs. random positions). A more detailed description of instance sets can be found in [15]. One major advantage of neural combinatorial optimization approaches over traditional handcrafted optimization methods is their ability to quickly learn customized heuristics for new problem settings.
Simulation-guided Beam Search for Neural Combinatorial Optimization
Neural approaches for combinatorial optimization (CO) equip a learning mechanism to discover powerful heuristics for solving complex real-world problems. While neural approaches capable of high-quality solutions in a single shot are emerging, state-of-the-art approaches are often unable to take full advantage of the solving time available to them. In contrast, hand-crafted heuristics perform highly effective search well and exploit the computation time given to them, but contain heuristics that are difficult to adapt to a dataset being solved. With the goal of providing a powerful search procedure to neural CO approaches, we propose simulation-guided beam search (SGBS), which examines candidate solutions within a fixed-width tree search that both a neural net-learned policy and a simulation (rollout) identify as promising. We further hybridize SGBS with efficient active search (EAS), where SGBS enhances the quality of solutions backpropagated in EAS, and EAS improves the quality of the policy used in SGBS.