Potfer, Marius
Instance-dependent Stochastic Lipschitz bandit
Potfer, Marius, Perchet, Vianney
We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as $\tildeΘ \left ( T^{d+1/d+2}\right )$, or adaptive via the zooming dimension $d_z$, yielding $\tildeΘ \left ( T^{d_z+1/d_z+2}\right )$. However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of $f$. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of $f$ over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension $d^\star>0$, we obtain improved adaptive rates of order $\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right )$ strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.
Comparing Uniform Price and Discriminatory Multi-Unit Auctions through Regret Minimization
Potfer, Marius, Perchet, Vianney
Repeated multi-unit auctions, where a seller allocates multiple identical items over many rounds, are common mechanisms in electricity markets and treasury auctions. We compare the two predominant formats: uniform-price and discriminatory auctions, focusing on the perspective of a single bidder learning to bid against stochastic adversaries. We characterize the learning difficulty in each format, showing that the regret scales similarly for both auction formats under both full-information and bandit feedback, as $\tildeΘ ( \sqrt{T} )$ and $\tildeΘ ( T^{2/3} )$, respectively. However, analysis beyond worst-case regret reveals structural differences: uniform-price auctions may admit faster learning rates, with regret scaling as $\tildeΘ ( \sqrt{T} )$ in settings where discriminatory auctions remain at $\tildeΘ ( T^{2/3} )$. Finally, we provide a specific analysis for auctions in which the other participants are symmetric and have unit-demand, and show that in these instances, a similar regret rate separation appears.
Improved learning rates in multi-unit uniform price auctions
Potfer, Marius, Baudry, Dorian, Richard, Hugo, Perchet, Vianney, Wan, Cheng
Motivated by the strategic participation of electricity producers in electricity day-ahead market, we study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting. The main contribution of this paper is the introduction of a new modeling of the bid space. Indeed, we prove that a learning algorithm leveraging the structure of this problem achieves a regret of $\tilde{O}(K^{4/3}T^{2/3})$ under bandit feedback, improving over the bound of $\tilde{O}(K^{7/4}T^{3/4})$ previously obtained in the literature. This improved regret rate is tight up to logarithmic terms. Inspired by electricity reserve markets, we further introduce a different feedback model under which all winning bids are revealed. This feedback interpolates between the full-information and bandit scenarios depending on the auctions' results. We prove that, under this feedback, the algorithm that we propose achieves regret $\tilde{O}(K^{5/2}\sqrt{T})$.