AITopics

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Game Theory (0.68)

Neural Information Processing SystemsFeb-7-2026, 19:25:34 GMT

Observable

The concept of information structure is also fundamental to studying the phenomenon ofpartial observability.

artificial intelligence, machine learning, reinforcement learning, (22 more...)

Country:

North America > United States (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.45)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Game Theory (0.67)
(2 more...)

Neural Information Processing SystemsNov-14-2025, 17:21:41 GMT

7ed2d3454c5eea71148b11d0c25104ff-Supplemental.pdf

agent, river tile, sequence, (16 more...)

Country: North America > United States > Texas (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Game Theory (0.68)

Neural Information Processing SystemsAug-15-2025, 10:54:55 GMT

7ed2d3454c5eea71148b11d0c25104ff-Supplemental.pdf

agent, river tile, sequence, (16 more...)

Country: North America > United States > Texas (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Game Theory (0.68)

arXiv.org Artificial IntelligenceJun-3-2025

Online Competitive Information Gathering for Partially Observable Trajectory Games

Krusniak, Mel, Xu, Hang, Palermo, Parker, Laine, Forrest

Game-theoretic agents must make plans that optimally gather information about their opponents. These problems are modeled by partially observable stochastic games (POSGs), but planning in fully continuous POSGs is intractable without heavy offline computation or assumptions on the order of belief maintained by each player. We formulate a finite history/horizon refinement of POSGs which admits competitive information gathering behavior in trajectory space, and through a series of approximations, we present an online method for computing rational trajectory plans in these games which leverages particle-based estimations of the joint state space and performs stochastic gradient play. We also provide the necessary adjustments required to deploy this method on individual agents. The method is tested in continuous pursuit-evasion and warehouse-pickup scenarios (alongside extensions to $N > 2$ players and to more complex environments with visual and physical obstacles), demonstrating evidence of active information gathering and outperforming passive competitors.

artificial intelligence, machine learning, particle, (16 more...)

2506.01927

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Bovy, Eline M., Suilen, Marnix, Junges, Sebastian, Jansen, Nils

Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

arXiv.org Artificial IntelligenceMay-8-2024

Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPS give access to results for the widely studied POSG model; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.

optimal value, rpomdp, stickiness, (14 more...)

2405.04941

Country:

North America > United States (0.28)
Europe > Netherlands (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceApr-16-2024

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Yan, Rui, Santos, Gabriel, Norman, Gethin, Parker, David, Kwiatkowska, Marta

We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an $\varepsilon$-minimax strategy profile, which requires only one linear program to be solved for each agent at every stage, instead of a complex estimation of opponent counterfactual values. For the partially-informed agent, we propose a continual resolving approach which uses lower bounds, pre-computed offline with heuristic search value iteration (HSVI), instead of opponent counterfactual values. This inherits the soundness of continual resolving at the cost of pre-computing the bound. For the fully-informed agent, we propose an inferred-belief strategy, where the agent maintains an inferred belief about the belief of the partially-informed agent based on (offline) upper bounds from HSVI, guaranteeing $\varepsilon$-distance to the value of the game at the initial belief known to both agents.

ag 1, ag 2, init, (16 more...)

2404.10679

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

arXiv.org Artificial IntelligenceDec-30-2023

Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations

Wang, Guojian, Wu, Faguo, Zhang, Xiao, Chen, Tianyuan

The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized temporal credit assignment (CA) to achieve impressive results in multiple hard tasks. However, many CA methods relied on complex architectures or introduced sensitive hyperparameters to estimate the impact of state-action pairs. Meanwhile, the premise of the feasibility of CA methods is to obtain trajectories with sparse rewards, which can be troublesome in sparse-reward environments with large state spaces. To tackle these problems, we propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG) that leverages a small set of sparse-reward demonstrations to make reliable and effective long-term credit assignments while efficiently facilitating exploration. The key idea is that the relative impact of state-action pairs can be indirectly estimated using offline demonstrations rather than directly leveraging the sparse reward trajectories generated by the agent. Specifically, we first obtain the trajectory importance by considering both the trajectory-level distance to demonstrations and the returns of the relevant trajectories. Then, the guidance reward is calculated for each state-action pair by smoothly averaging the importance of the trajectories through it, merging the demonstration's distribution and reward information. We theoretically analyze the performance improvement bound caused by smooth guidance rewards and derive a new worst-case lower bound on the performance improvement. Extensive results demonstrate POSG's significant advantages in control performance and convergence speed compared to benchmark DRL algorithms. Notably, the specific metrics and quantifiable results are investigated to demonstrate the superiority of POSG.

demonstration, guidance reward, trajectory, (13 more...)

2401.00162

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Cunha, Rafael F., Castellini, Jacopo, Peralez, Johan, Dibangoye, Jilles S.

On Convex Optimal Value Functions For POSGs

arXiv.org Artificial IntelligenceDec-6-2023

Multi-agent planning and reinforcement learning can be challenging when agents cannot see the state of the world or communicate with each other due to communication costs, latency, or noise. Partially Observable Stochastic Games (POSGs) provide a mathematical framework for modelling such scenarios. This paper aims to improve the efficiency of planning and reinforcement learning algorithms for POSGs by identifying the underlying structure of optimal state-value functions. The approach involves reformulating the original game from the perspective of a trusted third party who plans on behalf of the agents simultaneously. From this viewpoint, the original POSGs can be viewed as Markov games where states are occupancy states, i.e., posterior probability distributions over the hidden states of the world and the stream of actions and observations that agents have experienced so far. This study mainly proves that the optimal state-value function is a convex function of occupancy states expressed on an appropriate basis in all zero-sum, common-payoff, and Stackelberg POSGs.

occupancy state, optimal state-value function, private occupancy state, (14 more...)

2311.09459

Country:

Europe > France (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.45)

Liu, Xiangyu, Zhang, Kaiqing

Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

arXiv.org Artificial IntelligenceAug-16-2023

We study provable multi-agent reinforcement learning (MARL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential \emph{information-sharing} among agents, a common practice in empirical MARL, and a standard model for multi-agent control systems with communications. We first establish several computation complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for computational efficiency in solving POSGs. We then propose to further \emph{approximate} the shared common information to construct an {approximate model} of the POSG, in which planning an approximate equilibrium (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable MARL algorithm that is both statistically and computationally quasi-efficient. We hope our study may open up the possibilities of leveraging and even designing different \emph{information structures}, for developing both sample- and computation-efficient partially observable MARL.

information, machine learning, reinforcement learning, (19 more...)

2308.08705

Country:

North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre:

Research Report (1.00)
Workflow (0.93)

Industry: Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)