AITopics | gp-mw

We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight (MW) method. We obtain novel kernel-dependent regret bounds that are comparable to the known bounds in the full information setting, while substantially improving upon the existing bandit results. We experimentally demonstrate the effectiveness of GP-MW in random matrix games, as well as real-world problems of traffic routing and movie recommendation. In our experiments, GP-MW consistently outperforms several baselines, while its performance is often comparable to methods that have access to full information feedback.

name change, no-regret learning, unknown game, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No-Regret Learning in Unknown Games with Correlated Payoffs

Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Neural Information Processing SystemsOct-2-2025, 22:01:07 GMT

The performance of an agent in a repeated game is often measured in terms of regret .

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe (0.28)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

concerns (C

Neural Information Processing SystemsOct-2-2025, 22:00:52 GMT

We would like to thank all the reviewers for their constructive feedback. Citations refer to references in the paper and to the additional ones provided below. "I do agree that full information feedback is hard to expect in real scenarios,... However, the current Is there an application where this is a more realistic assumption?" The main motivation for our model is a setting that is in between the full information and bandit feedback. The proposed feedback model is also present in other practical applications.

artificial intelligence, assumption, reward function, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.49)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.30)

Add feedback

Contextual Games: Multi-Agent Learning with Side Information

Neural Information Processing SystemsAug-17-2025, 08:54:17 GMT

Motivated by these considerations, we introduce the new class of contextual games .

contextual game, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.64)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

No-Regret Learning in Unknown Games with Correlated Payoffs

Neural Information Processing SystemsOct-10-2024, 05:07:48 GMT

We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs).

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No-Regret Learning in Unknown Games with Correlated Payoffs

Sessa, Pier Giuseppe, Bogunovic, Ilija, Kamgarpour, Maryam, Krause, Andreas

Neural Information Processing SystemsMar-19-2020, 02:16:14 GMT

We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs).

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No-Regret Learning in Unknown Games with Correlated Payoffs

Sessa, Pier Giuseppe, Bogunovic, Ilija, Kamgarpour, Maryam, Krause, Andreas

arXiv.org Machine LearningSep-18-2019

We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight (MW) method. We obtain novel kernel-dependent regret bounds that are comparable to the known bounds in the full information setting, while substantially improving upon the existing bandit results. We experimentally demonstrate the effectiveness of GP-MW in random matrix games, as well as real-world problems of traffic routing and movie recommendation. In our experiments, GP-MW consistently outperforms several baselines, while its performance is often comparable to methods that have access to full information feedback.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1909.0854

Country: