argmax
Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
Multi-objective bandits have attracted increasing attention because of their broad applicability and mathematical elegance, where the reward of each arm is a multi-dimensional vector rather than a scalar. This naturally introduces Pareto order relations and Pareto regret. A long-standing question in this area is whether performance is fundamentally harder to optimize because of this added complexity. A recent surprising result shows that, in the adversarial setting, Pareto regret is no larger than classical regret; however, in the stochastic setting, where the regret notion is different, the picture remains unclear. In fact, existing work suggests that Pareto regret in the stochastic case increases with the dimensionality. This controversial yet subtle phenomenon motivates our central question: \emph{are multi-objective bandits actually harder than single-objective ones?} We answer this question in full by showing that, in the stochastic setting, Pareto regret is in fact governed by the maximum sub-optimality gap \(g^\dagger\), and hence by the minimum marginal regret of order \(Ω(\frac{K\log T}{g^\dagger})\). We further develop a new algorithm that achieves Pareto regret of order \(O(\frac{K\log T}{g^\dagger})\), and is therefore optimal. The algorithm leverages a nested two-layer uncertainty quantification over both arms and objectives through upper and lower confidence bound estimators. It combines a top-two racing strategy for arm selection with an uncertainty-greedy rule for dimension selection. Together, these components balance exploration and exploitation across the two layers. We also conduct comprehensive numerical experiments to validate the proposed algorithm, showing the desired regret guarantee and significant gains over benchmark methods.
e464656edca5e58850f8cec98cbb979b-Supplemental.pdf
To be consistent with accuracy definition, we denote the correctness ofstj for instance t as sim(stj,rt) = ( 2 distance(stj,rt))/ 2 where sim(stj,rt) is in the range [0,1] and distance(stj,rt) is in range [0, 2], 2 is the largest Euclidean distance in the probability simplex. Given a test dataset I, the correctness of a learner SLj on I can be denoted as 2 corrSLj = 1n Pn t=1sim(stj,rt). In this section, we define multiple metrics for consistency, accuracy, and correct-consistency in detail. Figure 1 shows the metrics computation in our experiments. We have created a git repository for this work and will be posted upon the acceptance and publicationofthiswork.
Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork Qiang Gao
DSN primarily seeks to transfer knowledge to the new coming task from the learned tasks by selecting the affiliated weights of a small set of neurons to be activated, including the reused neurons from prior tasks via neuron-wise masks. And it also transfers possibly valuable knowledge to the earlier tasks via data-free replay.
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Leisure & Entertainment (0.47)
- Information Technology > Security & Privacy (0.46)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Portugal > Porto > Porto (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science > Data Mining (0.92)
- North America > United States > New Jersey > Hudson County > Hoboken (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Europe > Italy > Lombardy > Milan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
LatentTemplateInductionwithGumbel-CRFs Appendix
Papandreou and Yuille[4] proposed the Perturb-and-MAP Random Field, an efficient sampling method forgeneral MarkovRandom Field. We compare the detailed structure of gradients of each estimator. All gradients are formed as a summation over the steps. The Gumbel-CRF and PM-MRF estimator can be decomposed with a pathwise term, where we take gradientoff w.r.t. Since the official test set is not publically available, we use the same training/ validation/ test split as Fu et al.[1].