Goto

Collaborating Authors

 Markov Models


Deep Reinforcement Learning-based Obstacle Avoidance for Robot Movement in Warehouse Environments

arXiv.org Artificial Intelligence

At present, in most warehouse environments, the accumulation of goods is complex, and the management personnel in the control of goods at the same time with the warehouse mobile robot trajectory interaction, the traditional mobile robot can not be very good on the goods and pedestrians to feed back the correct obstacle avoidance strategy, in order to control the mobile robot in the warehouse environment efficiently and friendly to complete the obstacle avoidance task, this paper proposes a deep reinforcement learning based on the warehouse environment, the mobile robot obstacle avoidance Algorithm. Firstly, for the insufficient learning ability of the value function network in the deep reinforcement learning algorithm, the value function network is improved based on the pedestrian interaction, the interaction information between pedestrians is extracted through the pedestrian angle grid, and the temporal features of individual pedestrians are extracted through the attention mechanism, so that we can learn to obtain the relative importance of the current state and the historical trajectory state as well as the joint impact on the robot's obstacle avoidance strategy, which provides an opportunity for the learning of multi-layer perceptual machines afterwards. Secondly, the reward function of reinforcement learning is designed based on the spatial behaviour of pedestrians, and the robot is punished for the state where the angle changes too much, so as to achieve the requirement of comfortable obstacle avoidance; Finally, the feasibility and effectiveness of the deep reinforcement learning-based mobile robot obstacle avoidance algorithm in the warehouse environment in the complex environment of the warehouse are verified through simulation experiments.


Evaluating Robot Influence on Pedestrian Behavior Models for Crowd Simulation and Benchmarking

arXiv.org Artificial Intelligence

The presence of robots amongst pedestrians affects them causing deviation to their trajectories. Existing methods suffer from the limitation of not being able to objectively measure this deviation in unseen cases. In order to solve this issue, we introduce a simulation framework that repetitively measures and benchmarks the deviation in trajectory of pedestrians due to robots driven by different navigation algorithms. We simulate the deviation behavior of the pedestrians using an enhanced Social Force Model (SFM) with a robot force component that accounts for the influence of robots on pedestrian behavior, resulting in the Social Robot Force Model (SRFM). Parameters for this model are learned using the pedestrian trajectories from the JRDB dataset [1]. Pedestrians are then simulated using the SRFM with and without the robot force component to objectively measure the deviation to their trajectory caused by the robot in 5 different scenarios. Our work in this paper is a proof of concept that shows objectively measuring the pedestrian reaction to robot is possible. We use our simulation to train two different RL policies and evaluate them against traditional navigation models.


Adaptive Compensation for Robotic Joint Failures Using Partially Observable Reinforcement Learning

arXiv.org Artificial Intelligence

Robotic manipulators are widely used in various industries for complex and repetitive tasks. However, they remain vulnerable to unexpected hardware failures. In this study, we address the challenge of enabling a robotic manipulator to complete tasks despite joint malfunctions. Specifically, we develop a reinforcement learning (RL) framework to adaptively compensate for a non-functional joint during task execution. Our experimental platform is the Franka robot with 7 degrees of freedom (DOFs). We formulate the problem as a partially observable Markov decision process (POMDP), where the robot is trained under various joint failure conditions and tested in both seen and unseen scenarios. We consider scenarios where a joint is permanently broken and where it functions intermittently. Additionally, we demonstrate the effectiveness of our approach by comparing it with traditional inverse kinematics-based control methods. The results show that the RL algorithm enables the robot to successfully complete tasks even with joint failures, achieving a high success rate with an average rate of 93.6%. This showcases its robustness and adaptability. Our findings highlight the potential of RL to enhance the resilience and reliability of robotic systems, making them better suited for unpredictable environments. All related codes and models are published online.


MEGA-PT: A Meta-Game Framework for Agile Penetration Testing

arXiv.org Artificial Intelligence

Penetration testing is an essential means of proactive defense in the face of escalating cybersecurity incidents. Traditional manual penetration testing methods are time-consuming, resource-intensive, and prone to human errors. Current trends in automated penetration testing are also impractical, facing significant challenges such as the curse of dimensionality, scalability issues, and lack of adaptability to network changes. To address these issues, we propose MEGA-PT, a meta-game penetration testing framework, featuring micro tactic games for node-level local interactions and a macro strategy process for network-wide attack chains. The micro- and macro-level modeling enables distributed, adaptive, collaborative, and fast penetration testing. MEGA-PT offers agile solutions for various security schemes, including optimal local penetration plans, purple teaming solutions, and risk assessment, providing fundamental principles to guide future automated penetration testing. Our experiments demonstrate the effectiveness and agility of our model by providing improved defense strategies and adaptability to changes at both local and network levels.


R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models

arXiv.org Artificial Intelligence

Although research has produced promising results demonstrating the utility of active inference (AIF) in Markov decision processes (MDPs), there is relatively less work that builds AIF models in the context of environments and problems that take the form of partially observable Markov decision processes (POMDPs). In POMDP scenarios, the agent must infer the unobserved environmental state from raw sensory observations, e.g., pixels in an image. Additionally, less work exists in examining the most difficult form of POMDP-centered control: continuous action space POMDPs under sparse reward signals. In this work, we address issues facing the AIF modeling paradigm by introducing novel prior preference learning techniques and self-revision schedules to help the agent excel in sparse-reward, continuous action, goal-based robotic control POMDP environments. Empirically, we show that our agents offer improved performance over state-of-the-art models in terms of cumulative rewards, relative stability, and success rate. The code in support of this work can be found at https://github.com/NACLab/robust-active-inference.


Adaptive Task Allocation in Multi-Human Multi-Robot Teams under Team Heterogeneity and Dynamic Information Uncertainty

arXiv.org Artificial Intelligence

Task allocation in multi-human multi-robot (MH-MR) teams presents significant challenges due to the inherent heterogeneity of team members, the dynamics of task execution, and the information uncertainty of operational states. Existing approaches often fail to address these challenges simultaneously, resulting in suboptimal performance. To tackle this, we propose ATA-HRL, an adaptive task allocation framework using hierarchical reinforcement learning (HRL), which incorporates initial task allocation (ITA) that leverages team heterogeneity and conditional task reallocation in response to dynamic operational states. Additionally, we introduce an auxiliary state representation learning task to manage information uncertainty and enhance task execution. Through an extensive case study in large-scale environmental monitoring tasks, we demonstrate the benefits of our approach.


Instigating Cooperation among LLM Agents Using Adaptive Information Modulation

arXiv.org Artificial Intelligence

This paper introduces a novel framework combining LLM agents as proxies for human strategic behavior with reinforcement learning (RL) to engage these agents in evolving strategic interactions within team environments. Our approach extends traditional agent-based simulations by using strategic LLM agents (SLA) and introducing dynamic and adaptive governance through a pro-social promoting RL agent (PPA) that modulates information access across agents in a network, optimizing social welfare and promoting pro-social behavior. Through validation in iterative games, including the prisoner's dilemma, we demonstrate that SLA agents exhibit nuanced strategic adaptations. The PPA agent effectively learns to adjust information transparency, resulting in enhanced cooperation rates. This framework offers significant insights into AI-mediated social dynamics, contributing to the deployment of AI in real-world team settings.


Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework

arXiv.org Artificial Intelligence

At present, Connected Autonomous Vehicles (CAVs) have begun to open road testing around the world, but their safety and efficiency performance in complex scenarios is still not satisfactory. Cooperative driving leverages the connectivity ability of CAVs to achieve synergies greater than the sum of their parts, making it a promising approach to improving CAV performance in complex scenarios. However, the lack of interaction and continuous learning ability limits current cooperative driving to single-scenario applications and specific Cooperative Driving Automation (CDA). To address these challenges, this paper proposes CoDrivingLLM, an interactive and learnable LLM-driven cooperative driving framework, to achieve all-scenario and all-CDA. First, since Large Language Models(LLMs) are not adept at handling mathematical calculations, an environment module is introduced to update vehicle positions based on semantic decisions, thus avoiding potential errors from direct LLM control of vehicle positions. Second, based on the four levels of CDA defined by the SAE J3216 standard, we propose a Chain-of-Thought (COT) based reasoning module that includes state perception, intent sharing, negotiation, and decision-making, enhancing the stability of LLMs in multi-step reasoning tasks. Centralized conflict resolution is then managed through a conflict coordinator in the reasoning process. Finally, by introducing a memory module and employing retrieval-augmented generation, CAVs are endowed with the ability to learn from their past experiences. We validate the proposed CoDrivingLLM through ablation experiments on the negotiation module, reasoning with different shots experience, and comparison with other cooperative driving methods.


Autonomous Driving at Unsignalized Intersections: A Review of Decision-Making Challenges and Reinforcement Learning-Based Solutions

arXiv.org Artificial Intelligence

Autonomous driving at unsignalized intersections is still considered a challenging application for machine learning due to the complications associated with handling complex multi-agent scenarios characterized by a high degree of uncertainty. Automating the decision-making process at these safety-critical environments involves comprehending multiple levels of abstractions associated with learning robust driving behaviors to enable the vehicle to navigate efficiently. In this survey, we aim at exploring the state-of-the-art techniques implemented for decision-making applications, with a focus on algorithms that combine Reinforcement Learning (RL) and deep learning for learning traversing policies at unsignalized intersections. The reviewed schemes vary in the proposed driving scenario, in the assumptions made for the used intersection model, in the tackled challenges, and in the learning algorithms that are used. We have presented comparisons for these techniques to highlight their limitations and strengths. Based on our in-depth investigation, it can be discerned that a robust decision-making scheme for navigating real-world unsignalized intersection has yet to be developed. Along with our analysis and discussion, we recommend potential research directions encouraging the interested players to tackle the highlighted challenges. By adhering to our recommendations, decision-making architectures that are both non-overcautious and safe, yet feasible, can be trained and validated in real-world unsignalized intersections environments.


Multi-Agent Vulcan: An Information-Driven Multi-Agent Path Finding Approach

arXiv.org Artificial Intelligence

Scientists often search for phenomena of interest while exploring new environments. Autonomous vehicles are deployed to explore such areas where human-operated vehicles would be costly or dangerous. Online control of autonomous vehicles for information-gathering is called adaptive sampling and can be framed as a POMDP that uses information gain as its principal objective. While prior work focuses largely on single-agent scenarios, this paper confronts challenges unique to multi-agent adaptive sampling, such as avoiding redundant observations, preventing vehicle collision, and facilitating path planning under limited communication. We start with Multi-Agent Path Finding (MAPF) methods, which address collision avoidance by decomposing the MAPF problem into a series of single-agent path planning problems. We then present information-driven MAPF which addresses multi-agent information gain under limited communication. First, we introduce an admissible heuristic that relaxes mutual information gain to an additive function that can be evaluated as a set of independent single agent path planning problems. Second, we extend our approach to a distributed system that is robust to limited communication. When all agents are in range, the group plans jointly to maximize information. When some agents move out of range, communicating subgroups are formed and the subgroups plan independently. Since redundant observations are less likely when vehicles are far apart, this approach only incurs a small loss in information gain, resulting in an approach that gracefully transitions from full to partial communication. We evaluate our method against other adaptive sampling strategies across various scenarios, including real-world robotic applications. Our method was able to locate up to 200% more unique phenomena in certain scenarios, and each agent located its first unique phenomenon faster by up to 50%.