Improved Analysis for Bandit Learning in Matching Markets

May-27-2025, 11:47:06 GMT–Neural Information Processing Systems

A rich line of works study the bandit learning problem in two-sided matching markets, where one side of market participants (players) are uncertain about their preferences and hope to find a stable matching during iterative matchings with the other side (arms). The state-of-the-art analysis shows that the player-optimal stable regret is of order O(K\log T/\Delta 2) where K is the number of arms, T is the horizon and \Delta is the players' minimum preference gap. However, this result may be far from the lower bound \Omega(\max\{N\log T/\Delta 2, K\log T/\Delta\}) since the number K of arms (workers, publisher slots) may be much larger than that N of players (employers in labor markets, advertisers in online advertising, respectively). In this paper, we propose a new algorithm and show that the regret can be upper bounded by O(N 2\log T/\Delta 2 K \log T/\Delta) . This result removes the dependence on K in the main order term and improves the state-of-the-art guarantee in common cases where N is much smaller than K . Such an advantage is also verified in experiments.

bandit learning, improved analysis, matching market, (3 more...)

Neural Information Processing Systems

May-27-2025, 11:47:06 GMT

Conferences Web Page

Add feedback

Industry:
- Banking & Finance (0.61)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.41)