Online Learning of Optimal Bidding Strategy in Repeated Multi-Commodity Auctions

Baltaoglu, M. Sevi, Tong, Lang, Zhao, Qing

Neural Information Processing Systems 

We study the online learning problem of a bidder who participates in repeated auctions. With the goal of maximizing his T-period payoff, the bidder determines the optimal allocation of his budget among his bids for $K$ goods at each period. As a bidding strategy, we propose a polynomial-time algorithm, inspired by the dynamic programming approach to the knapsack problem. The proposed algorithm, referred to as dynamic programming on discrete set (DPDS), achieves a regret order of $O(\sqrt{T\log{T}})$. By showing that the regret is lower bounded by $\Omega(\sqrt{T})$ for any strategy, we conclude that DPDS is order optimal up to a $\sqrt{\log{T}}$ term.