tie-breaking rule
Tight Generalization Bounds for Noiseless Inverse Optimization
Fatemi, Pouria, Maskan, Hoomaan, Sra, Suvrit, Esfahani, Peyman Mohajerin
Inverse optimization (IO) seeks to infer the parameters of a decision-maker's objective from observed context--action data. We study noiseless IO, where demonstrations are generated by a ground-truth objective. We provide a high-probability ${O}(\frac{d}{T})$ generalization bound for the induced action set, where $d$ is the number of unknown parameters and $T$ is the size of the training dataset. We strengthen these guarantees under additional conditions that ensure uniqueness of the chosen action, bringing our IO guarantees in line with best-arm identification results in the bandit literature. We further show that the ${O}(\frac{d}{T})$ rate is tight over all consistent estimators considered here, and extend the result to both instantaneous and cumulative regret. Notably, the resulting regret lower bound matches the corresponding upper bounds in the adversarial setting, indicating that the stochastic IO setting is effectively adversarial for the class of estimators studied here. Finally, we propose a parameter-free algorithm with lower per-iteration complexity than generic solvers. Experiments validate the predicted rates and illustrate the tightness of our bounds.
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
Terenin, Alexander, Negrea, Jeffrey
We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling with a certain Gaussian process prior widely-used in the Bayesian optimization literature has a $\mathcal{O}(\beta\sqrt{T\log(1+\lambda)})$ rate against a $\beta$-bounded $\lambda$-Lipschitz adversary.
Dueling Over Dessert, Mastering the Art of Repeated Cake Cutting
Brรขnzei, Simina, Hajiaghayi, MohammadTaghi, Phillips, Reed, Shin, Suho, Wang, Kun
We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice's cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered by Aumann and Maschler (1995). We observe that if Bob is almost myopic and chooses his favorite piece too often, then he can be systematically exploited by Alice through a strategy akin to a binary search. This strategy allows Alice to approximate Bob's preferences with increasing precision, thereby securing a disproportionate share of the resource over time. We analyze the limits of how much a player can exploit the other one and show that fair utility profiles are in fact achievable. Specifically, the players can enforce the equitable utility profile of $(1/2, 1/2)$ in the limit on every trajectory of play, by keeping the other player's utility to approximately $1/2$ on average while guaranteeing they themselves get at least approximately $1/2$ on average. We show this theorem using a connection with Blackwell approachability. Finally, we analyze a natural dynamic known as fictitious play, where players best respond to the empirical distribution of the other player. We show that fictitious play converges to the equitable utility profile of $(1/2, 1/2)$ at a rate of $O(1/\sqrt{T})$.
Maximisation of Admissible Multi-Objective Heuristics
Haslum, Patrik | Wang, Ryan Xiao (Australian National University)
In multi-objective (MO) heuristic search, solution costs, as well as heuristic values, are sets of multi-dimensional cost vectors, representing possible non-dominated trade-offs between objectives. The maximum of two or more such vector sets, which is an important operation in creating informative admissible MO heuristics, can be defined in several ways: Geiรer et al. recently proposed two MO maximum operators, the component-wise maximum (comax) and the anti-dominance maximum (admax), which represent different trade-offs between informativeness and computational cost. We show that the anti-dominance maximum is not admissibility-preserving, and propose an alternative, the "select one" maximum (somax). We also show that the comax operator is the greatest admissibility-preserving MO maximum, and briefly investigate its efficient implementation. The conclusion of our experimental results is that somax achieves a trade-off similar to that intended with admax - cheaper to compute but less informed - also when compared to an improved comax implementation.
A Nearest Neighbor Characterization of Lebesgue Points in Metric Measure Spaces
Cesari, Tommaso, Colomboni, Roberto
The property of almost every point being a Lebesgue point has proven to be crucial for the consistency of several classification algorithms based on nearest neighbors. We characterize Lebesgue points in terms of a 1-Nearest Neighbor regression algorithm for pointwise estimation, fleshing out the role played by tie-breaking rules in the corresponding convergence problem. We then give an application of our results, proving the convergence of the risk of a large class of 1-Nearest Neighbor classification algorithms in general metric spaces where almost every point is a Lebesgue point.
Online Competitive Influence Maximization
Zuo, Jinhang, Liu, Xutong, Joe-Wong, Carlee, Lui, John C. S., Chen, Wei
Online influence maximization has attracted much attention as a way to maximize influence spread through a social network while learning the values of unknown network parameters. Most previous works focus on single-item diffusion. In this paper, we introduce a new Online Competitive Influence Maximization (OCIM) problem, where two competing items (e.g., products, news stories) propagate in the same network and influence probabilities on edges are unknown. We adapt the combinatorial multi-armed bandit (CMAB) framework for the OCIM problem, but unlike the non-competitive setting, the important monotonicity property (influence spread increases when influence probabilities on edges increase) no longer holds due to the competitive nature of propagation, which brings a significant new challenge to the problem. We prove that the Triggering Probability Modulated (TPM) condition for CMAB still holds, and then utilize the property of competitive diffusion to introduce a new offline oracle, and discuss how to implement this new oracle in various cases. We propose an OCIM-OIFU algorithm with such an oracle that achieves logarithmic regret. We also design an OCIM-ETC algorithm that has worse regret bound but requires less feedback and easier offline computation. Our experimental evaluations demonstrate the effectiveness of our algorithms.