Goto

Collaborating Authors

 Markov Models


MBC: Multi-Brain Collaborative Control for Quadruped Robots

arXiv.org Artificial Intelligence

In the field of locomotion task of quadruped robots, Blind Policy and Perceptive Policy each have their own advantages and limitations. The Blind Policy relies on preset sensor information and algorithms, suitable for known and structured environments, but it lacks adaptability in complex or unknown environments. The Perceptive Policy uses visual sensors to obtain detailed environmental information, allowing it to adapt to complex terrains, but its effectiveness is limited under occluded conditions, especially when perception fails. Unlike the Blind Policy, the Perceptive Policy is not as robust under these conditions. To address these challenges, we propose a MBC:Multi-Brain collaborative system that incorporates the concepts of Multi-Agent Reinforcement Learning and introduces collaboration between the Blind Policy and the Perceptive Policy. By applying this multi-policy collaborative model to a quadruped robot, the robot can maintain stable locomotion even when the perceptual system is impaired or observational data is incomplete. Our simulations and real-world experiments demonstrate that this system significantly improves the robot's passability and robustness against perception failures in complex environments, validating the effectiveness of multi-policy collaboration in enhancing robotic motion performance.


Rao-Blackwellized POMDP Planning

arXiv.org Artificial Intelligence

Abstract--Partially Observable Markov Decision Processes (POMDPs) provide a structured framework for decision-making under uncertainty, but their application requires efficient belief updates. Sequential Importance Resampling Particle Filters (SIRPF), also known as Bootstrap Particle Filters, are commonly used as belief updaters in large approximate POMDP solvers, but they face challenges such as particle deprivation and high computational costs as the system's state dimension grows. To address these issues, this study introduces Rao-Blackwellized POMDP (RB-POMDP) approximate solvers and outlines generic methods to apply Rao-Blackwellization in both belief updates and online planning. POMCPOW (left) and RB-POMCPOW (right) Tree Structure Comparison. Moreover, as Partially Observable Markov Decision Processes (POMDPs) the system's effective dimension grows, a substantial increase are a powerful mathematical framework for modeling in the number of particles may be required to maintain decision-making under uncertainty where an agent operates performance, resulting in high computational costs (e.g. Rao-Blackwellized Particle Filtering (RBPF) offer a promising POMDPs have been widely applied to various domains such solution to address some of these limitations of the SIRPF.


A fast and sound tagging method for discontinuous named-entity recognition

arXiv.org Artificial Intelligence

We introduce a novel tagging scheme for discontinuous named entity recognition based on an explicit description of the inner structure of discontinuous mentions. We rely on a weighted finite state automaton for both marginal and maximum a posteriori inference. As such, our method is sound in the sense that (1) well-formedness of predicted tag sequences is ensured via the automaton structure and (2) there is an unambiguous mapping between well-formed sequences of tags and (discontinuous) mentions. We evaluate our approach on three English datasets in the biomedical domain, and report comparable results to state-of-the-art while having a way simpler and faster model.


Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

arXiv.org Artificial Intelligence

Probabilistic logical models are a core component of neurosymbolic AI and are important models in their own right for tasks that require high explainability. Unlike neural networks, logical models are often handcrafted using domain expertise, making their development costly and prone to errors. While there are algorithms that learn logical models from data, they are generally prohibitively expensive, limiting their applicability in real-world settings. In this work, we introduce precision and recall for logical rules and define their composition as rule utility -- a cost-effective measure to evaluate the predictive power of logical models. Further, we introduce SPECTRUM, a scalable framework for learning logical models from relational data. Its scalability derives from a linear-time algorithm that mines recurrent structures in the data along with a second algorithm that, using the cheap utility measure, efficiently ranks rules built from these structures. Moreover, we derive theoretical guarantees on the utility of the learnt logical model. As a result, SPECTRUM learns more accurate logical models orders of magnitude faster than previous methods on real-world datasets.


Grounded Computation & Consciousness: A Framework for Exploring Consciousness in Machines & Other Organisms

arXiv.org Artificial Intelligence

Computational modeling is a critical tool for understanding consciousness, but is it enough on its own? This paper discusses the necessity for an ontological basis of consciousness, and introduces a formal framework for grounding computational descriptions into an ontological substrate. Utilizing this technique, a method is demonstrated for estimating the difference in qualitative experience between two systems. This framework has wide applicability to computational theories of consciousness.


Development and Validation of Heparin Dosing Policies Using an Offline Reinforcement Learning Algorithm

arXiv.org Artificial Intelligence

Appropriate medication dosages in the intensive care unit (ICU) are critical for patient survival. Heparin, used to treat thrombosis and inhibit blood clotting in the ICU, requires careful administration due to its complexity and sensitivity to various factors, including patient clinical characteristics, underlying medical conditions, and potential drug interactions. Incorrect dosing can lead to severe complications such as strokes or excessive bleeding. To address these challenges, this study proposes a reinforcement learning (RL)-based personalized optimal heparin dosing policy that guides dosing decisions reliably within the therapeutic range based on individual patient conditions. A batch-constrained policy was implemented to minimize out-of-distribution errors in an offline RL environment and effectively integrate RL with existing clinician policies. The policy's effectiveness was evaluated using weighted importance sampling, an off-policy evaluation method, and the relationship between state representations and Q-values was explored using t-SNE. Both quantitative and qualitative analyses were conducted using the Medical Information Mart for Intensive Care III (MIMIC-III) database, demonstrating the efficacy of the proposed RL-based medication policy. Leveraging advanced machine learning techniques and extensive clinical data, this research enhances heparin administration practices and establishes a precedent for the development of sophisticated decision-support tools in medicine.


Intent Prediction-Driven Model Predictive Control for UAV Planning and Navigation in Dynamic Environments

arXiv.org Artificial Intelligence

The emergence of indoor aerial robots holds significant potential for enhancing construction site workers' productivity by autonomously performing inspection and mapping tasks. The key challenge to this application is ensuring navigation safety with human workers. While navigation in static environments has been extensively studied, navigating dynamic environments remains open due to challenges in perception and planning. Payload limitations of unmanned aerial vehicles limit them to using cameras with limited fields of view, resulting in unreliable perception and tracking during collision avoidance. Moreover, the unpredictable nature of the dynamic environments can quickly make the generated optimal trajectory outdated. To address these challenges, this paper presents a comprehensive navigation framework that incorporates both perception and planning, introducing the concept of dynamic obstacle intent prediction. Our perception module detects and tracks dynamic obstacles efficiently and handles tracking loss and occlusion during collision avoidance. The proposed intent prediction module employs a Markov Decision Process (MDP) to forecast potential actions of dynamic obstacles with the possible future trajectories. Finally, a novel intent-based planning algorithm, leveraging model predictive control (MPC), is applied to generate safe navigation trajectories. Simulation and physical experiments demonstrate that our method enables safe navigation in dynamic environments and achieves the fewest collisions compared to benchmarks.


Agent-state based policies in POMDPs: Beyond belief-state MDPs

arXiv.org Artificial Intelligence

The traditional approach to POMDPs is to convert them into fully observed MDPs by considering a belief state as an information state. However, a belief-state based approach requires perfect knowledge of the system dynamics and is therefore not applicable in the learning setting where the system model is unknown. Various approaches to circumvent this limitation have been proposed in the literature. We present a unified treatment of some of these approaches by viewing them as models where the agent maintains a local recursively updateable agent state and chooses actions based on the agent state. We highlight the different classes of agent-state based policies and the various approaches that have been proposed in the literature to find good policies within each class. These include the designer's approach to find optimal non-stationary agent-state based policies, policy search approaches to find a locally optimal stationary agent-state based policies, and the approximate information state to find approximately optimal stationary agent-state based policies. We then present how ideas from the approximate information state approach have been used to improve Q-learning and actor-critic algorithms for learning in POMDPs.


Dynamic Game-Theoretical Decision-Making Framework for Vehicle-Pedestrian Interaction with Human Bounded Rationality

arXiv.org Artificial Intelligence

Human-involved interactive environments pose significant challenges for autonomous vehicle decision-making processes due to the complexity and uncertainty of human behavior. It is crucial to develop an explainable and trustworthy decision-making system for autonomous vehicles interacting with pedestrians. Previous studies often used traditional game theory to describe interactions for its interpretability. However, it assumes complete human rationality and unlimited reasoning abilities, which is unrealistic. To solve this limitation and improve model accuracy, this paper proposes a novel framework that integrates the partially observable markov decision process with behavioral game theory to dynamically model AV-pedestrian interactions at the unsignalized intersection. Both the AV and the pedestrian are modeled as dynamic-belief-induced quantal cognitive hierarchy (DB-QCH) models, considering human reasoning limitations and bounded rationality in the decision-making process. In addition, a dynamic belief updating mechanism allows the AV to update its understanding of the opponent's rationality degree in real-time based on observed behaviors and adapt its strategies accordingly. The analysis results indicate that our models effectively simulate vehicle-pedestrian interactions and our proposed AV decision-making approach performs well in safety, efficiency, and smoothness. It closely resembles real-world driving behavior and even achieves more comfortable driving navigation compared to our previous virtual reality experimental data.


The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

arXiv.org Artificial Intelligence

The general-utility Markov decision processes (GUMDPs) framework generalizes the MDPs framework by considering objective functions that depend on the frequency of visitation of state-action pairs induced by a given policy. In this work, we contribute with the first analysis on the impact of the number of trials, i.e., the number of randomly sampled trajectories, in infinite-horizon GUMDPs. We show that, as opposed to standard MDPs, the number of trials plays a key-role in infinite-horizon GUMDPs and the expected performance of a given policy depends, in general, on the number of trials. We consider both discounted and average GUMDPs, where the objective function depends, respectively, on discounted and average frequencies of visitation of state-action pairs. First, we study policy evaluation under discounted GUMDPs, proving lower and upper bounds on the mismatch between the finite and infinite trials formulations for GUMDPs. Second, we address average GUMDPs, studying how different classes of GUMDPs impact the mismatch between the finite and infinite trials formulations. Third, we provide a set of empirical results to support our claims, highlighting how the number of trajectories and the structure of the underlying GUMDP influence policy evaluation.