nash policy
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (6 more...)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- (8 more...)
Identifying Time-varying Costs in Finite-horizon Linear Quadratic Gaussian Games
We address cost identification in a finite-horizon linear quadratic Gaussian game. We characterize the set of cost parameters that generate a given Nash equilibrium policy. We propose a backpropagation algorithm to identify the time-varying cost parameters. We derive a probabilistic error bound when the cost parameters are identified from finite trajectories. We test our method in numerical and driving simulations. Our algorithm identifies the cost parameters that can reproduce the Nash equilibrium policy and trajectory observations.
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (6 more...)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- (8 more...)
Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games
Leonardos, Stefanos, Overman, Will, Panageas, Ioannis, Piliouras, Georgios
Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multi-agent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multi-agent coordination with and without state dependence? We present a novel definition of Markov Potential Games (MPG) that generalizes prior attempts at capturing complex stateful multi-agent coordination. Counter-intuitively, insights from normal-form potential games do not carry over as MPGs can consist of settings where state-games can be zero-sum games. In the opposite direction, Markov games where every state-game is a potential game are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main technical result, we prove fast convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Zhang, Yuheng, Yu, Dian, Ge, Tao, Song, Linfeng, Zeng, Zhichen, Mi, Haitao, Jiang, Nan, Yu, Dong
Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences. Many existing alignment approaches rely on the Bradley-Terry (BT) model assumption, which assumes the existence of a ground-truth reward for each prompt-response pair. However, this assumption can be overly restrictive when modeling complex human preferences. In this paper, we drop the BT model assumption and study LLM alignment under general preferences, formulated as a two-player game. Drawing on theoretical insights from learning in games, we integrate optimistic online mirror descent into our alignment framework to approximate the Nash policy. Theoretically, we demonstrate that our approach achieves an $O(T^{-1})$ bound on the duality gap, improving upon the previous $O(T^{-1/2})$ result. More importantly, we implement our method and show through experiments that it outperforms state-of-the-art RLHF algorithms across multiple representative benchmarks.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
STLGame: Signal Temporal Logic Games in Adversarial Multi-Agent Systems
Yang, Shuo, Zheng, Hongrui, Vasile, Cristian-Ioan, Pappas, George, Mangharam, Rahul
We study how to synthesize a robust and safe policy for autonomous systems under signal temporal logic (STL) tasks in adversarial settings against unknown dynamic agents. To ensure the worst-case STL satisfaction, we propose STLGame, a framework that models the multi-agent system as a two-player zero-sum game, where the ego agents try to maximize the STL satisfaction and other agents minimize it. STLGame aims to find a Nash equilibrium policy profile, which is the best case in terms of robustness against unseen opponent policies, by using the fictitious self-play (FSP) framework. FSP iteratively converges to a Nash profile, even in games set in continuous state-action spaces. We propose a gradient-based method with differentiable STL formulas, which is crucial in continuous settings to approximate the best responses at each iteration of FSP. We show this key aspect experimentally by comparing with reinforcement learning-based methods to find the best response. Experiments on two standard dynamical system benchmarks, Ackermann steering vehicles and autonomous drones, demonstrate that our converged policy is almost unexploitable and robust to various unseen opponents' policies. All code and additional experimental results can be found on our project website: https://sites.google.com/view/stlgame
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
- North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)