Goto

Collaborating Authors

 chance constraint



Flipping-based Policy for Chance-Constrained Markov Decision Processes

Neural Information Processing Systems

Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a \textit{flipping-based policy} for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state.


Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration

Pan, Yiyuan, Liu, Zhe, Wang, Hesheng

arXiv.org Artificial Intelligence

Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC generates theoretically-grounded intrinsic rewards, encouraging agents to explore state transitions with high information gain. We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2. Empirical results demonstrate that exploration with CERMIC significantly outperforms SoTA algorithms in sparse-reward environments.


A Gradient Guided Diffusion Framework for Chance Constrained Programming

Zhang, Boyang, Wang, Zhiguo, Liu, Ya-Feng

arXiv.org Artificial Intelligence

Chance constrained programming (CCP) is a powerful framework for addressing optimization problems under uncertainty. In this paper, we introduce a novel Gradient-Guided Diffusion-based Optimization framework, termed GGDOpt, which tackles CCP through three key innovations. First, GGDOpt accommodates a broad class of CCP problems without requiring the knowledge of the exact distribution of uncertainty-relying solely on a set of samples. Second, to address the nonconvexity of the chance constraints, it reformulates the CCP as a sampling problem over the product of two distributions: an unknown data distribution supported on a nonconvex set and a Boltzmann distribution defined by the objective function, which fully leverages both first- and second-order gradient information. Third, GGDOpt has theoretical convergence guarantees and provides practical error bounds under mild assumptions. By progressively injecting noise during the forward diffusion process to convexify the nonconvex feasible region, GGDOpt enables guided reverse sampling to generate asymptotically optimal solutions. Experimental results on synthetic datasets and a waveform design task in wireless communications demonstrate that GGDOpt outperforms existing methods in both solution quality and stability with nearly 80% overhead reduction.



Efficient Probabilistic Planning with Maximum-Coverage Distributionally Robust Backward Reachable Trees

Rose, Alex, Aggarwal, Naman, Jewison, Christopher, How, Jonathan P.

arXiv.org Artificial Intelligence

This paper presents a new multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a Euclidean ball with high probability. We develop a new formulation for ball-shaped ambiguity sets of Gaussian distributions and leverage it to develop a distributionally robust belief roadmap construction algorithm. This algorithm synthe- sizes robust controllers which are certified to be safe for maximal size ball-shaped ambiguity sets of Gaussian distributions. Our algorithm achieves better coverage than the maximal coverage algorithm for planning over Gaussian distributions [1], and we identify mild conditions under which our algorithm achieves strictly better coverage. For the special case of no process noise or state constraints, we formally prove that our algorithm achieves maximal coverage. In addition, we present a second multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a region parameterized by the Minkowski sum of an ellipsoid and a Euclidean ball with high probability. This algorithm plans over ellipsoidal sets of maximal size ball-shaped ambiguity sets of Gaussian distributions, and provably achieves equal or better coverage than the best-known algorithm for planning over ellipsoidal ambiguity sets of Gaussian distributions [2]. We demonstrate the efficacy of both methods in a wide range of conditions via extensive simulation experiments.


Data-Driven Density Steering via the Gromov-Wasserstein Optimal Transport Distance

Nakashima, Haruto, Ganguly, Siddhartha, Kashima, Kenji

arXiv.org Artificial Intelligence

-- We tackle the data-driven chance-constrained density steering problem using the Gromov-Wasserstein metric. The underlying dynamical system is an unknown linear controlled recursion, with the assumption that sufficiently rich input-output data from pre-operational experiments are available. The initial state is modeled as a Gaussian mixture, while the terminal state is required to match a specified Gaussian distribution. We reformulate the resulting optimal control problem as a difference-of-convex program and show that it can be efficiently and tractably solved using the DC algorithm. The term data-driven has become increasingly prevalent in the modern control literature [1].


A Robust Cooperative Vehicle Coordination Framework for Intersection Crossing

Bai, Haojie, Luo, Jiping, Li, Huafu, Zhao, Xiongwei, Wang, Yang

arXiv.org Artificial Intelligence

--Cooperative vehicle coordination at unsignalized intersections has garnered significant interest from both academia and industry in recent years, highlighting its notable advantages in improving traffic throughput and fuel efficiency. The oversights pose driving risks in the presence of state uncertainty and communication constraint. T o address this gap, we propose a robust and comprehensive intersection coordination framework consisting of a robust cooperative trajectory planner and a context-aware status update scheduler . The trajectory planner directly controls the evolution of the trajectory distributions during frequent vehicle interactions, thereby offering probabilistic safety guarantees. T o further align with coordination safety in practical bandwidth-limited conditions, we propose a context-aware status update scheduler that dynamically prioritizes the state updating order of vehicles based on their driving urgency. Simulation results validate the robustness and effectiveness of the proposed coordination framework, showing that the collision probability can be significantly reduced while maintaining comparable coordination efficiency to state-of-the-art strategies. Moreover, our proposed framework demonstrates superior effectiveness in utilizing wireless resources in practical uncertain and bandwidth-limited conditions. Recent advancements in information and control technologies have shown significant potential to enhance the performance of connected and autonomous vehicles (CA Vs) [1]. Unlike standalone autonomous driving solutions, CA Vs share information via vehicle-to-everything (V2X) communication links and make decisions collaboratively to achieve a common goal. This collectivism has demonstrated its superiority in driving safety and traffic efficiency [2], [3]. In recent years, vehicle coordination at critical areas, especially road intersections, has gained substantial research interest and is considered a key enabler for intelligent transportation systems (ITS) [4]. This work has been supported in part by the Science and Technology Project of Shenzhen under Grant JCYJ20200109113424990, and the Marine Economy Development Project of Guangdong Province under Grant GDNRC [2020]014.


Robust Optimal Task Planning to Maximize Battery Life

Li, Jiachen, Jian, Chu, Zhao, Feiyang, Li, Shihao, Li, Wei, Chen, Dongmei

arXiv.org Artificial Intelligence

This paper proposes a control-oriented optimization platform for autonomous mobile robots (AMRs), focusing on extending battery life while ensuring task completion. The requirement of fast AMR task planning while maintaining minimum battery state of charge, thus maximizing the battery life, renders a bilinear optimization problem. McCormick envelop technique is proposed to linearize the bilinear term. A novel planning algorithm with relaxed constraints is also developed to handle parameter uncertainties robustly with high efficiency ensured. Simulation results are provided to demonstrate the utility of the proposed methods in reducing battery degradation while satisfying task completion requirements.


Flipping-based Policy for Chance-Constrained Markov Decision Processes

Neural Information Processing Systems

Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a \textit{flipping-based policy} for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state.