Markov Models
A Systematic Literature Review on Safety of the Intended Functionality for Automated Driving Systems
Patel, Milin, Jung, Rolf, Khatun, Marzana
In the automobile industry, ensuring the safety of automated vehicles equipped with the Automated Driving System (ADS) is becoming a significant focus due to the increasing development and deployment of automated driving. Automated driving depends on sensing both the external and internal environments of a vehicle, utilizing perception sensors and algorithms, and Electrical/Electronic (E/E) systems for situational awareness and response. ISO 21448 is the standard for Safety of the Intended Functionality (SOTIF) that aims to ensure that the ADS operate safely within their intended functionality. SOTIF focuses on preventing or mitigating potential hazards that may arise from the limitations or failures of the ADS, including hazards due to insufficiencies of specification, or performance insufficiencies, as well as foreseeable misuse of the intended functionality. However, the challenge lies in ensuring the safety of vehicles despite the limited availability of extensive and systematic literature on SOTIF. To address this challenge, a Systematic Literature Review (SLR) on SOTIF for the ADS is performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The objective is to methodically gather and analyze the existing literature on SOTIF. The major contributions of this paper are: (i) presenting a summary of the literature by synthesizing and organizing the collective findings, methodologies, and insights into distinct thematic groups, and (ii) summarizing and categorizing the acknowledged limitations based on data extracted from an SLR of 51 research papers published between 2018 and 2023. Furthermore, research gaps are determined, and future research directions are proposed.
Flexible Prefrontal Control over Hippocampal Episodic Memory for Goal-Directed Generalization
Zheng, Yicong, Wolf, Nora, Ranganath, Charan, O'Reilly, Randall C., McKee, Kevin L.
Many tasks require flexibly modifying perception and behavior based on current goals. Humans can retrieve episodic memories from days to years ago, using them to contextualize and generalize behaviors across novel but structurally related situations. The brain's ability to control episodic memories based on task demands is often attributed to interactions between the prefrontal cortex (PFC) and hippocampus (HPC). We propose a reinforcement learning model that incorporates a PFC-HPC interaction mechanism for goal-directed generalization. In our model, the PFC learns to generate query-key representations to encode and retrieve goal-relevant episodic memories, modulating HPC memories top-down based on current task demands. Moreover, the PFC adapts its encoding and retrieval strategies dynamically when faced with multiple goals presented in a blocked, rather than interleaved, manner. Our results show that: (1) combining working memory with selectively retrieved episodic memory allows transfer of decisions among similar environments or situations, (2) top-down control from PFC over HPC improves learning of arbitrary structural associations between events for generalization to novel environments compared to a bottom-up sensory-driven approach, and (3) the PFC encodes generalizable representations during both encoding and retrieval of goal-relevant memories, whereas the HPC exhibits event-specific representations. Together, these findings highlight the importance of goal-directed prefrontal control over hippocampal episodic memory for decision-making in novel situations and suggest a computational mechanism by which PFC-HPC interactions enable flexible behavior.
Scalable Multi-Agent Reinforcement Learning for Residential Load Scheduling under Data Governance
Qin, Zhaoming, Dong, Nanqing, Liu, Di, Wang, Zhefan, Cao, Junwei
As a data-driven approach, multi-agent reinforcement learning (MARL) has made remarkable advances in solving cooperative residential load scheduling problems. However, centralized training, the most common paradigm for MARL, limits large-scale deployment in communication-constrained cloud-edge environments. As a remedy, distributed training shows unparalleled advantages in real-world applications but still faces challenge with system scalability, e.g., the high cost of communication overhead during coordinating individual agents, and needs to comply with data governance in terms of privacy. In this work, we propose a novel MARL solution to address these two practical issues. Our proposed approach is based on actor-critic methods, where the global critic is a learned function of individual critics computed solely based on local observations of households. This scheme preserves household privacy completely and significantly reduces communication cost. Simulation experiments demonstrate that the proposed framework achieves comparable performance to the state-of-the-art actor-critic framework without data governance and communication constraints.
Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements
Marino, Antonio, Restrepo, Esteban, Pacchierotti, Claudio, Giordano, Paolo Robuffo
This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, LGTC-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforcement learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimental results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.
Discrete-Time Hybrid Automata Learning: Legged Locomotion Meets Skateboarding
Liu, Hang, Teng, Sangli, Liu, Ben, Zhang, Wei, Ghaffari, Maani
The controller enables the robot to perform smooth and natural skateboarding motions, with reliable mode identification and transitions under disturbances. Abstract --This paper introduces Discrete-time Hybrid Automata Learning (DHAL), a framework using on-policy Reinforcement Learning to identify and execute mode-switching without trajectory segmentation or event function learning. Hybrid dynamical systems, which include continuous flow and discrete mode switching, can model robotics tasks like legged robot locomotion. Model-based methods usually depend on predefined gaits, while model-free approaches lack explicit mode-switching knowledge. Current methods identify discrete modes via segmentation before regressing continuous flow, but learning high-dimensional complex rigid body dynamics without trajectory labels or segmentation is a challenging open problem. Our approach incorporates a beta policy distribution and a multi-critic architecture to model contact-guided motions, exemplified by a challenging quadrupedal robot skateboard task. I. INTRODUCTION Legged robots are often regarded as the ideal embodiment of robotic systems, designed to perform a wide range of tasks and navigate diverse destinations. Many of these tasks, such as skateboarding and boxing, are inherently contact-guided, involving complex sequences of contact events [1]. Designing and executing such contact-guided control is highly non-trivial due to two major challenges: (1) the hybrid dynamics system problem arising from the abrupt transitions introduced by contact events [2], and (2) the sparsity of contact events, which poses significant difficulties for both model-based and model-free control strategies. In model-based control, Hybrid Automata has been proposed as a powerful framework to model systems with both discrete and continuous dynamics [3, 4]. This framework has been widely applied to behavior planning [5] and legged locomotion. However, due to the combinatorial nature of hybrid dynamics, finding optimal policies for hybrid systems through model-based optimization is computationally challenging, especially for tasks with high-dimensional state and action spaces. Model-free RL requires minimal assumptions and can be applied to a diverse range of tasks across different dynamic systems [6, 7]. However, RL policies, often represented by deep neural networks, lack interpretability and fail to explicitly model hybrid dynamics [8].
Stone Soup Multi-Target Tracking Feature Extraction For Autonomous Search And Track In Deep Reinforcement Learning Environment
Ewers, Jan-Hendrik, Gibbs, Joe, Anderson, David
Management of sensing resources is a non-trivial problem for future military air assets with future systems deploying heterogeneous sensors to generate information of the battlespace. Machine learning techniques including deep reinforcement learning (DRL) have been identified as promising approaches, but require high-fidelity training environments and feature extractors to generate information for the agent. This paper presents a deep reinforcement learning training approach, utilising the Stone Soup tracking framework as a feature extractor to train an agent for a sensor management task. A general framework for embedding Stone Soup tracker components within a Gymnasium environment is presented, enabling fast and configurable tracker deployments for RL training using Stable Baselines3. The approach is demonstrated in a sensor management task where an agent is trained to search and track a region of airspace utilising track lists generated from Stone Soup trackers. A sample implementation using three neural network architectures in a search-and-track scenario demonstrates the approach and shows that RL agents can outperform simple sensor search and track policies when trained within the Gymnasium and Stone Soup environment.
ATLaS: Agent Tuning via Learning Critical Steps
Chen, Zhixun, Li, Ming, Huang, Yuxuan, Du, Yali, Fang, Meng, Zhou, Tianyi
Large Language Model (LLM) agents have demonstrated remarkable generalization capabilities across multi-domain tasks. Existing agent tuning approaches typically employ supervised finetuning on entire expert trajectories. However, behavior-cloning of full trajectories can introduce expert bias and weaken generalization to states not covered by the expert data. Additionally, critical steps, such as planning, complex reasoning for intermediate subtasks, and strategic decision-making, are essential to success in agent tasks, so learning these steps is the key to improving LLM agents. For more effective and efficient agent tuning, we propose ATLaS that identifies the critical steps in expert trajectories and finetunes LLMs solely on these steps with reduced costs. By steering the training's focus to a few critical steps, our method mitigates the risk of overfitting entire trajectories and promotes generalization across different environments and tasks. In extensive experiments, an LLM finetuned on only 30% critical steps selected by ATLaS outperforms the LLM finetuned on all steps and recent open-source LLM agents. ATLaS maintains and improves base LLM skills as generalist agents interacting with diverse environments.
RPF-Search: Field-based Search for Robot Person Following in Unknown Dynamic Environments
Ye, Hanjing, Cai, Kuanqi, Zhan, Yu, Xia, Bingyi, Ajoudani, Arash, Zhang, Hong
Autonomous robot person-following (RPF) systems are crucial for personal assistance and security but suffer from target loss due to occlusions in dynamic, unknown environments. Current methods rely on pre-built maps and assume static environments, limiting their effectiveness in real-world settings. There is a critical gap in re-finding targets under topographic (e.g., walls, corners) and dynamic (e.g., moving pedestrians) occlusions. In this paper, we propose a novel heuristic-guided search framework that dynamically builds environmental maps while following the target and resolves various occlusions by prioritizing high-probability areas for locating the target. For topographic occlusions, a belief-guided search field is constructed and used to evaluate the likelihood of the target's presence, while for dynamic occlusions, a fluid-field approach allows the robot to adaptively follow or overtake moving occluders. Past motion cues and environmental observations refine the search decision over time. Our results demonstrate that the proposed method outperforms existing approaches in terms of search efficiency and success rates, both in simulations and real-world tests. Our target search method enhances the adaptability and reliability of RPF systems in unknown and dynamic environments to support their use in real-world applications. Our code, video, experimental results and appendix are available at https://medlartea.github.io/rpf-search/.
OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments
Wang, Qianwei, Xu, Yifan, Kamat, Vineet, Menassa, Carol
OV AMOS: A Framework for Open-V ocabulary Multi-Object Search in Unknown Environments Qianwei Wang*, Yifan Xu*, Vineet Kamat, and Carol Menassa Abstract -- Object search is a fundamental task for robots deployed in indoor building environments, yet challenges arise due to observation instability, especially for open-vocabulary models. While foundation models (LLMs/VLMs) enable reasoning about object locations even without direct visibility, the ability to recover from failures and replan remains crucial. T o address these challenges, we propose a framework integrating VLM-based reasoning, frontier-based exploration, and a Partially Observable Markov Decision Process (POMDP) framework to solve the MOS problem in novel environments. VLM enhances search efficiency by inferring object-environment relationships, frontier-based exploration guides navigation in unknown spaces, and POMDP models observation uncertainty, allowing recovery from failures in occlusion and cluttered environments. We evaluate our framework on 120 simulated scenarios across several Habitat-Matterport3D (HM3D) scenes and a real-world robot experiment in a 50-square-meter office, demonstrating significant improvements in both efficiency and success rate over baseline methods. I NTRODUCTION Multi-Object Search (MOS) is a crucial task in robotics [1]. Consider a scenario where in a workplace setting, a robot may need to retrieve multiple objects to complete a task, such as gathering necessary documents, tools, or equipment for an assembly process.
Trajectory-Class-Aware Multi-Agent Reinforcement Learning
Na, Hyungho, Lee, Kwanghyeon, Lee, Sumin, Moon, Il-Chul
In the context of multi-agent reinforcement learning, generalization is a challenge to solve various tasks that may require different joint policies or coordination without relying on policies specialized for each task. We refer to this type of problem as a multi-task, and we train agents to be versatile in this multi-task setting through a single training process. To address this challenge, we introduce TRajectory-class-Aware Multi-Agent reinforcement learning (TRAMA). In TRAMA, agents recognize a task type by identifying the class of trajectories they are experiencing through partial observations, and the agents use this trajectory awareness or prediction as additional information for action policy. To this end, we introduce three primary objectives in TRAMA: (a) constructing a quantized latent space to generate trajectory embeddings that reflect key similarities among them; (b) conducting trajectory clustering using these trajectory embeddings; and (c) building a trajectory-class-aware policy. Specifically for (c), we introduce a trajectory-class predictor that performs agent-wise predictions on the trajectory class; and we design a trajectory-class representation model for each trajectory class. Each agent takes actions based on this trajectory-class representation along with its partial observation for task-aware execution. The proposed method is evaluated on various tasks, including multi-task problems built upon StarCraft II. Empirical results show further performance improvements over state-of-the-art baselines.