occupancy state
On Dynamic Programming Theory for Leader-Follower Stochastic Games
Dibangoye, Jilles Steeve, Marre, Thibaut Le, Sankur, Ocan, Schwarzentruber, François
Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
Miao, Ziliang, Chen, Runjian, Cai, Yixi, He, Buwei, Zhao, Wenquan, Shao, Wenqi, Zhang, Bo, Zhang, Fu
Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose \textbf{T}emporal \textbf{O}verlapping \textbf{P}rediction (\textbf{TOP}), a self-supervised pre-training method that alleviate the labeling burden for MOS. \textbf{TOP} explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called $\text{mIoU}_{\text{obj}}$ to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that \textbf{TOP} outperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77\% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.
$\epsilon$-Optimally Solving Zero-Sum POSGs
Escudie, Erwan, Sabatelli, Matthia, Dibangoye, Jilles
A recent method for solving zero-sum partially observable stochastic games (zs-POSGs) embeds the original game into a new one called the occupancy Markov game. This reformulation allows applying Bellman's principle of optimality to solve zs-POSGs. However, improving a current solution requires solving a linear program with exponentially many potential constraints, which significantly restricts the scalability of this approach. This paper exploits the optimal value function's novel uniform continuity properties to overcome this limitation. We first construct a new operator that is computationally more efficient than the state-of-the-art update rules without compromising optimality. In particular, improving a current solution now involves a linear program with an exponential drop in constraints. We then also show that point-based value iteration algorithms utilizing our findings improve the scalability of existing methods while maintaining guarantees in various domains.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
On Convex Optimal Value Functions For POSGs
Cunha, Rafael F., Castellini, Jacopo, Peralez, Johan, Dibangoye, Jilles S.
Multi-agent planning and reinforcement learning can be challenging when agents cannot see the state of the world or communicate with each other due to communication costs, latency, or noise. Partially Observable Stochastic Games (POSGs) provide a mathematical framework for modelling such scenarios. This paper aims to improve the efficiency of planning and reinforcement learning algorithms for POSGs by identifying the underlying structure of optimal state-value functions. The approach involves reformulating the original game from the perspective of a trusted third party who plans on behalf of the agents simultaneously. From this viewpoint, the original POSGs can be viewed as Markov games where states are occupancy states, i.e., posterior probability distributions over the hidden states of the world and the stream of actions and observations that agents have experienced so far. This study mainly proves that the optimal state-value function is a convex function of occupancy states expressed on an appropriate basis in all zero-sum, common-payoff, and Stackelberg POSGs.
- Europe > France (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.45)
Predicting Future Spatiotemporal Occupancy Grids with Semantics for Autonomous Driving
Toyungyernsub, Maneekwan, Yel, Esen, Li, Jiachen, Kochenderfer, Mykel J.
For autonomous vehicles to proactively plan safe trajectories and make informed decisions, they must be able to predict the future occupancy states of the local environment. However, common issues with occupancy prediction include predictions where moving objects vanish or become blurred, particularly at longer time horizons. We propose an environment prediction framework that incorporates environment semantics for future occupancy prediction. Our method first semantically segments the environment and uses this information along with the occupancy information to predict the spatiotemporal evolution of the environment. We validate our approach on the real-world Waymo Open Dataset. Compared to baseline methods, our model has higher prediction accuracy and is capable of maintaining moving object appearances in the predictions for longer prediction time horizons.
- Transportation > Ground > Road (0.40)
- Information Technology > Robotics & Automation (0.40)
- Automobiles & Trucks (0.40)
Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation
Hu, Yunze, Chen, Jiaao, Zhou, Kangjie, Gao, Han, Li, Yutong, Liu, Chang
Parking occupancy estimation holds significant potential in facilitating parking resource management and mitigating traffic congestion. Existing approaches employ robotic systems to detect the occupancy status of individual parking spaces and primarily focus on enhancing detection accuracy through perception pipelines. However, these methods often overlook the crucial aspect of robot path planning, which can hinder the accurate estimation of the entire parking area. In light of these limitations, we introduce the problem of informative path planning for parking occupancy estimation using autonomous vehicles and formulate it as a Partially Observable Markov Decision Process (POMDP) task. Then, we develop an occupancy state transition model and introduce a Bayes filter to estimate occupancy based on noisy sensor measurements. Subsequently, we propose the Monte Carlo Bayes Filter Tree, a computationally efficient algorithm that leverages progressive widening to generate informative paths. We demonstrate that the proposed approach outperforms the benchmark methods in diverse simulation environments, effectively striking a balance between optimality and computational efficiency.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
HSVI can solve zero-sum Partially Observable Stochastic Games
Delage, Aurélien, Buffet, Olivier, Dibangoye, Jilles S., Saffidine, Abdallah
State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems. In partially observable or collaborative settings (e.g., POMDPs and Dec- POMDPs), DP and HS require introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs- POSGs) as well, but how to apply it in the general case still remains an open question. We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value function that allow deriving bounds that come with solution strategies, (iii) proposing for the first time an HSVI-like solver that provably converges to an $\epsilon$-optimal solution in finite time, and (iv) empirically analyzing it. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Czechia > Prague (0.04)
- Oceania > Australia > New South Wales (0.04)
Dynamics-Aware Spatiotemporal Occupancy Prediction in Urban Environments
Toyungyernsub, Maneekwan, Yel, Esen, Li, Jiachen, Kochenderfer, Mykel J.
Detection and segmentation of moving obstacles, along with prediction of the future occupancy states of the local environment, are essential for autonomous vehicles to proactively make safe and informed decisions. In this paper, we propose a framework that integrates the two capabilities together using deep neural network architectures. Our method first detects and segments moving objects in the scene, and uses this information to predict the spatiotemporal evolution of the environment around autonomous vehicles. To address the problem of direct integration of both static-dynamic object segmentation and environment prediction models, we propose using occupancy-based environment representations across the whole framework. Our method is validated on the real-world Waymo Open Dataset and demonstrates higher prediction accuracy than baseline methods.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
On Bellman's Optimality Principle for zs-POSGs
Buffet, Olivier, Dibangoye, Jilles, Delage, Aurélien, Saffidine, Abdallah, Thomas, Vincent
Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $\epsilon$-Nash equilibrium in finite time.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Deep, spatially coherent Occupancy Maps based on Radar Measurements
Bauer, Daniel, Kuhnert, Lars, Eckstein, Lutz
One essential step to realize modern driver assistance technology is the accurate knowledge about the location of static objects in the environment. In this work, we use artificial neural networks to predict the occupation state of a whole scene in an end-to-end manner. This stands in contrast to the traditional approach of accumulating each detection's influence on the occupancy state and allows to learn spatial priors which can be used to interpolate the environment's occupancy state. We show that these priors make our method suitable to predict dense occupancy estimations from sparse, highly uncertain inputs, as given by automotive radars, even for complex urban scenarios. Furthermore, we demonstrate that these estimations can be used for large-scale mapping applications.
- Transportation > Ground > Road (0.66)
- Automobiles & Trucks > Manufacturer (0.48)
- Transportation > Passenger (0.48)