Goto

Collaborating Authors

 Markov Models


Combining LLMs with Logic-Based Framework to Explain MCTS

arXiv.org Artificial Intelligence

In response to the lack of trust in Artificial Intelligence (AI) for sequential planning, we design a Computational Tree Logic-guided large language model (LLM)-based natural language explanation framework designed for the Monte Carlo Tree Search (MCTS) algorithm. MCTS is often considered challenging to interpret due to the complexity of its search trees, but our framework is flexible enough to handle a wide range of free-form post-hoc queries and knowledge-based inquiries centered around MCTS and the Markov Decision Process (MDP) of the application domain. By transforming user queries into logic and variable statements, our framework ensures that the evidence obtained from the search tree remains factually consistent with the underlying environmental dynamics and any constraints in the actual stochastic control process. We evaluate the framework rigorously through quantitative assessments, where it demonstrates strong performance in terms of accuracy and factual consistency.


A Finite-State Controller Based Offline Solver for Deterministic POMDPs

arXiv.org Artificial Intelligence

Deterministic partially observable Markov decision processes (DetPOMDPs) often arise in planning problems where the agent is uncertain about its environmental state but can act and observe de-terministically. In this paper, we propose DetM-CVI, an adaptation of the Monte Carlo V alue Iteration (MCVI) algorithm for DetPOMDPs, which builds policies in the form of finite-state controllers (FSCs). DetMCVI solves large problems with a high success rate, outperforming existing baselines for DetPOMDPs. We also verify the performance of the algorithm in a real-world mobile robot forest mapping scenario.


Graph Privacy: A Heterogeneous Federated GNN for Trans-Border Financial Data Circulation

arXiv.org Artificial Intelligence

The sharing of external data has become a strong demand of financial institutions, but the privacy issue has led to the difficulty of interconnecting different platforms and the low degree of data openness. To effectively solve the privacy problem of financial data in trans-border flow and sharing, to ensure that the data is'available but not visible', to realize the joint portrait of all kinds of heterogeneous data of business organizations in different industries, we propose a Heterogeneous Federated Graph Neural Network (HFGNN) approach. In this method, the distribution of heterogeneous business data of trans-border organizations is taken as subgraphs, and the sharing and circulation process among subgraphs is constructed as a statistically heterogeneous global graph through a central server. Each subgraph learns the corresponding personalized service model through local training to select and update the relevant subset of sub-graphs with aggregated parameters, and effectively separates and combines topological and feature information among subgraphs. Finally, our simulation experimental results show that the proposed method has higher accuracy performance and faster convergence speed than existing methods. 1 Introduction With the continuous advancement of financial technology's profound empowerment of business, the shared application of external data (such as Internet companies, insurance companies, and other third-party data providers) has become a strong demand for financial institutions. Financial risk control and customer acquisition based on privacy computing have become the most important privacy computing implementation scenario at present [ Oyewole et al., 2024; Farayola et al., 2024 ] . However, there are three main risks in the current cooperation process between financial institutions and external data sources: first, it involves a large amount of personal user information and is subject to strict regulatory requirements; second, the data assets and trade secrets Corresponding author accumulated by the institution's own business are easy to be leaked; Third, because the data itself can be copied and easily spread, and once shared cannot be traced, the confirmation of data assets is difficult, and commercialization is seriously restricted[ Christian, 2024 ] .


Generative QoE Modeling: A Lightweight Approach for Telecom Networks

arXiv.org Artificial Intelligence

Quality of Experience (QoE) prediction plays a crucial role in optimizing resource management and enhancing user satisfaction across both telecommunication and OTT services. While recent advances predominantly rely on deep learning models, this study introduces a lightweight generative modeling framework that balances computational efficiency, interpretability, and predictive accuracy. By validating the use of Vector Quantization (VQ) as a preprocessing technique, continuous network features are effectively transformed into discrete categorical symbols, enabling integration with a Hidden Markov Model (HMM) for temporal sequence modeling. This VQ-HMM pipeline enhances the model's capacity to capture dynamic QoE patterns while supporting probabilistic inference on new and unseen data. Experimental results on publicly available time-series datasets incorporating both objective indicators and subjective QoE scores demonstrate the viability of this approach in real-time and resource-constrained environments, where inference latency is also critical. The framework offers a scalable alternative to complex deep learning methods, particularly in scenarios with limited computational resources or where latency constraints are critical.


Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey

arXiv.org Artificial Intelligence

Multi-Agent Reinforcement Learning (MARL) has become a powerful framework for numerous real-world applications, modeling distributed decision-making and learning from interactions with complex environments. Resource Allocation Optimization (RAO) benefits significantly from MARL's ability to tackle dynamic and decentralized contexts. MARL-based approaches are increasingly applied to RAO challenges across sectors playing pivotal roles to Industry 4.0 developments. This survey provides a comprehensive review of recent MARL algorithms for RAO, encompassing core concepts, classifications, and a structured taxonomy. By outlining the current research landscape and identifying primary challenges and future directions, this survey aims to support researchers and practitioners in leveraging MARL's potential to advance resource allocation solutions.


Toward Efficient Exploration by Large Language Model Agents

arXiv.org Artificial Intelligence

A burgeoning area within reinforcement learning (RL) is the design of sequential decision-making agents centered around large language models (LLMs). While autonomous decision-making agents powered by modern LLMs could facilitate numerous real-world applications, such successes demand agents that are capable of data-efficient RL. One key obstacle to achieving data efficiency in RL is exploration, a challenge that we demonstrate many recent proposals for LLM agent designs struggle to contend with. Meanwhile, classic algorithms from the RL literature known to gracefully address exploration require technical machinery that can be challenging to operationalize in purely natural language settings. In this work, rather than relying on finetuning or in-context learning to coax LLMs into implicitly imitating a RL algorithm, we illustrate how LLMs can be used to explicitly implement an existing RL algorithm (Posterior Sampling for Reinforcement Learning) whose capacity for statistically-efficient exploration is already well-studied. We offer empirical results demonstrating how our LLM-based implementation of a known, data-efficient RL algorithm can be considerably more effective in natural language tasks that demand prudent exploration.


Universal language model with the intervention of quantum theory

arXiv.org Artificial Intelligence

We assume that natural language has a property analogous to quantum mechanics, and the rationale for adopting this analogy comes not only from noticing that the meanings of natural language can be analogous described by in the form of superposition of states, but also from the fact that we notice that the representation of natural language symbols has a duality in which the symbols-meanings correspond to each other. At the same time, natural language processing (NLP) methods based on statistical implementations have been popular for decades and continue to evolve.It makes us wonder whether we can go further than statistical theory and consider introducing the relevant theories of quantum mechanics into the modeling of natural language. Meanwhile, in the last decade or so, NLP has generally adopted the approach of converting natural language symbols into some kind of mathematical representation for processing, and this approach, known as word embedding, has achieved surprisingly good performance in universal language processing tasks.We see that the technical idea of completely converting the symbols that make up natural language into a numerical representation for processing to build a universal language model(ULM) is highly feasible and possible. The experimental progress fed back from these two real-world applications motivates us to explore building natural language models based on quantum theory. Within the past century, quantum theory has become more and more perfect as mankind has studied the physical world in depth.


Cognitive maps are generative programs

arXiv.org Artificial Intelligence

Making sense of the world and acting in it relies on building simplified mental representations that abstract away aspects of reality. This principle of cognitive mapping is universal to agents with limited resources. Living organisms, people, and algorithms all face the problem of forming functional representations of their world under various computing constraints. In this work, we explore the hypothesis that human resource-efficient planning may arise from representing the world as predictably structured. Building on the metaphor of concepts as programs, we propose that cognitive maps can take the form of generative programs that exploit predictability and redundancy, in contrast to directly encoding spatial layouts. We use a behavioral experiment to show that people who navigate in structured spaces rely on modular planning strategies that align with programmatic map representations. We describe a computational model that predicts human behavior in a variety of structured scenarios. This model infers a small distribution over possible programmatic cognitive maps conditioned on human prior knowledge of the world, and uses this distribution to generate resource-efficient plans. Our models leverages a Large Language Model as an embedding of human priors, implicitly learned through training on a vast corpus of human data. Our model demonstrates improved computational efficiency, requires drastically less memory, and outperforms unstructured planning algorithms with cognitive constraints at predicting human behavior, suggesting that human planning strategies rely on programmatic cognitive maps.


Learned Perceptive Forward Dynamics Model for Safe and Platform-aware Robotic Navigation

arXiv.org Artificial Intelligence

Ensuring safe navigation in complex environments requires accurate real-time traversability assessment and understanding of environmental interactions relative to the robot`s capabilities. Traditional methods, which assume simplified dynamics, often require designing and tuning cost functions to safely guide paths or actions toward the goal. This process is tedious, environment-dependent, and not generalizable. To overcome these issues, we propose a novel learned perceptive Forward Dynamics Model (FDM) that predicts the robot`s future state conditioned on the surrounding geometry and history of proprioceptive measurements, proposing a more scalable, safer, and heuristic-free solution. The FDM is trained on multiple years of simulated navigation experience, including high-risk maneuvers, and real-world interactions to incorporate the full system dynamics beyond rigid body simulation. We integrate our perceptive FDM into a zero-shot Model Predictive Path Integral (MPPI) planning framework, leveraging the learned mapping between actions, future states, and failure probability. This allows for optimizing a simplified cost function, eliminating the need for extensive cost-tuning to ensure safety. On the legged robot ANYmal, the proposed perceptive FDM improves the position estimation by on average 41% over competitive baselines, which translates into a 27% higher navigation success rate in rough simulation environments. Moreover, we demonstrate effective sim-to-real transfer and showcase the benefit of training on synthetic and real data. Code and models are made publicly available under https://github.com/leggedrobotics/fdm.


An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has demonstrated excellent decision-making potential in platoon coordination problems. However, due to the variability of coordination goals, the complexity of the decision problem, and the time-consumption of trial-and-error in manual design, finding a well performance reward function to guide RL training to solve complex platoon coordination problems remains challenging. In this paper, we formally define the Platoon Coordination Reward Design Problem (PCRDP), extending the RL-based cooperative platoon coordination problem to incorporate automated reward function generation. To address PCRDP, we propose a Large Language Model (LLM)-based Platoon coordination Reward Design (PCRD) framework, which systematically automates reward function discovery through LLM-driven initialization and iterative optimization. In this method, LLM first initializes reward functions based on environment code and task requirements with an Analysis and Initial Reward (AIR) module, and then iteratively optimizes them based on training feedback with an evolutionary module. The AIR module guides LLM to deepen their understanding of code and tasks through a chain of thought, effectively mitigating hallucination risks in code generation. The evolutionary module fine-tunes and reconstructs the reward function, achieving a balance between exploration diversity and convergence stability for training. To validate our approach, we establish six challenging coordination scenarios with varying complexity levels within the Yangtze River Delta transportation network simulation. Comparative experimental results demonstrate that RL agents utilizing PCRD-generated reward functions consistently outperform human-engineered reward functions, achieving an average of 10\% higher performance metrics in all scenarios.