Goto

Collaborating Authors

 Markov Models


Episodic Future Thinking Mechanism for Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

Understanding cognitive processes in multi-agent interactions is a primary goal in cognitive science. It can guide the direction of artificial intelligence (AI) research toward social decision-making in multi-agent systems, which includes uncertainty from character heterogeneity. In this paper, we introduce an episodic future thinking (EFT) mechanism for a reinforcement learning (RL) agent, inspired by cognitive processes observed in animals. To enable future thinking functionality, we first develop a multi-character policy that captures diverse characters with an ensemble of heterogeneous policies. Here, the character of an agent is defined as a different weight combination on reward components, representing distinct behavioral preferences. The future thinking agent collects observation-action trajectories of the target agents and uses the pre-trained multi-character policy to infer their characters. Once the character is inferred, the agent predicts the upcoming actions of target agents and simulates the potential future scenario. This capability allows the agent to adaptively select the optimal action, considering the predicted future scenario in multi-agent interactions. To evaluate the proposed mechanism, we consider the multi-agent autonomous driving scenario with diverse driving traits and multiple particle environments. Simulation results demonstrate that the EFT mechanism with accurate character inference leads to a higher reward than existing multi-agent solutions. We also confirm that the effect of reward improvement remains valid across societies with different levels of character diversity.


Procedural Content Generation in Games: A Survey with Insights on Emerging LLM Integration

arXiv.org Artificial Intelligence

Procedural Content Generation (PCG) is defined as the automatic creation of game content using algorithms. PCG has a long history in both the game industry and the academic world. It can increase player engagement and ease the work of game designers. While recent advances in deep learning approaches in PCG have enabled researchers and practitioners to create more sophisticated content, it is the arrival of Large Language Models (LLMs) that truly disrupted the trajectory of PCG advancement. This survey explores the differences between various algorithms used for PCG, including search-based methods, machine learning-based methods, other frequently used methods (e.g., noise functions), and the newcomer, LLMs. We also provide a detailed discussion on combined methods. Furthermore, we compare these methods based on the type of content they generate and the publication dates of their respective papers. Finally, we identify gaps in the existing academic work and suggest possible directions for future research.


Modelling Structured Data Learning with Restricted Boltzmann Machines in the Teacher-Student Setting

arXiv.org Artificial Intelligence

Restricted Boltzmann machines (RBM) are generative models capable to learn data with a rich underlying structure. We study the teacher-student setting where a student RBM learns structured data generated by a teacher RBM. The amount of structure in the data is controlled by adjusting the number of hidden units of the teacher and the correlations in the rows of the weights, a.k.a. patterns. In the absence of correlations, we validate the conjecture that the performance is independent of the number of teacher patters and hidden units of the student RBMs, and we argue that the teacher-student setting can be used as a toy model for studying the lottery ticket hypothesis. Beyond this regime, we find that the critical amount of data required to learn the teacher patterns decreases with both their number and correlations. In both regimes, we find that, even with an relatively large dataset, it becomes impossible to learn the teacher patterns if the inference temperature used for regularization is kept too low. In our framework, the student can learn teacher patterns one-to-one or many-to-one, generalizing previous findings about the teacher-student setting with two hidden units to any arbitrary finite number of hidden units.


Disease Outbreak Detection and Forecasting: A Review of Methods and Data Sources

arXiv.org Artificial Intelligence

Infectious diseases occur when pathogens from other individuals or animals infect a person, resulting in harm to both individuals and society as a whole. The outbreak of such diseases can pose a significant threat to human health. However, early detection and tracking of these outbreaks have the potential to reduce the mortality impact. To address these threats, public health authorities have endeavored to establish comprehensive mechanisms for collecting disease data. Many countries have implemented infectious disease surveillance systems, with the detection of epidemics being a primary objective. The clinical healthcare system, local/state health agencies, federal agencies, academic/professional groups, and collaborating governmental entities all play pivotal roles within this system. Moreover, nowadays, search engines and social media platforms can serve as valuable tools for monitoring disease trends. The Internet and social media have become significant platforms where users share information about their preferences and relationships. This real-time information can be harnessed to gauge the influence of ideas and societal opinions, making it highly useful across various domains and research areas, such as marketing campaigns, financial predictions, and public health, among others. This article provides a review of the existing standard methods developed by researchers for detecting outbreaks using time series data. These methods leverage various data sources, including conventional data sources and social media data or Internet data sources. The review particularly concentrates on works published within the timeframe of 2015 to 2022.


Efficient Non-Myopic Layered Bayesian Optimization For Large-Scale Bathymetric Informative Path Planning

arXiv.org Artificial Intelligence

Informative path planning (IPP) applied to bathymetric mapping allows AUVs to focus on feature-rich areas to quickly reduce uncertainty and increase mapping efficiency. Existing methods based on Bayesian optimization (BO) over Gaussian Process (GP) maps work well on small scenarios but they are short-sighted and computationally heavy when mapping larger areas, hindering deployment in real applications. To overcome this, we present a 2-layered BO IPP method that performs non-myopic, real-time planning in a tree search fashion over large Stochastic Variational GP maps, while respecting the AUV motion constraints and accounting for localization uncertainty. Our framework outperforms the standard industrial lawn-mowing pattern and a myopic baseline in a set of hardware in the loop (HIL) experiments in an embedded platform over real bathymetry.


Adversarial Inception for Bounded Backdoor Poisoning in Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Recent works have demonstrated the vulnerability of Deep Reinforcement Learning (DRL) algorithms against training-time, backdoor poisoning attacks. These attacks induce pre-determined, adversarial behavior in the agent upon observing a fixed trigger during deployment while allowing the agent to solve its intended task during training. Prior attacks rely on arbitrarily large perturbations to the agent's rewards to achieve both of these objectives - leaving them open to detection. Thus, in this work, we propose a new class of backdoor attacks against DRL which achieve state of the art performance while minimally altering the agent's rewards. These "inception" attacks train the agent to associate the targeted adversarial behavior with high returns by inducing a disjunction between the agent's chosen action and the true action executed in the environment during training. We formally define these attacks and prove they can achieve both adversarial objectives. We then devise an online inception attack which significantly out-performs prior attacks under bounded reward constraints.


Statistical Inference for Temporal Difference Learning with Linear Function Approximation

arXiv.org Machine Learning

Statistical inference tasks, such as constructing confidence regions or simultaneous confidence intervals, are often addressed by deriving distributional theory such as central limit theorems (CLTs) for the estimator in use. Due to the high dimensionality of modern science and engineering applications, there has been a surge of interests in deriving convergence results that are valid in a finite-sample manner. In Reinforcement Learning (RL), a discipline that underpins many recent machine learning breakthroughs (Agarwal et al. (2019); Sutton and Barto (2018)) one central question is to evaluate with confidence the quality of a given policy, measured by its value function. As RL is often modeled as decision making in Markov decision processes (MDPs), the task of statistical inference needs to accommodate the online nature of the sampling mechanism. Temporal Difference (TD) learning (Sutton (1988)) is arguably the most widely used algorithm designed for this purpose. TD learning, which is an instance of stochastic approximation (SA) method (Robbins and Monro (1951)), approximates the value function of a given policy in an iterative manner. Towards understanding the non-asymptotic convergence rate of TD to the target value function, extensive recent efforts have been made (see, e.g.


Wireless Human-Machine Collaboration in Industry 5.0

arXiv.org Artificial Intelligence

--Wireless Human-Machine Collaboration (WHMC) represents a critical advancement for Industry 5.0, enabling seamless interaction between humans and machines across geographically distributed systems. As the WHMC systems become increasingly important for achieving complex collaborative control tasks, ensuring their stability is essential for practical deployment and long-term operation. Stability analysis certifies how the closed-loop system will behave under model randomness, which is essential for systems operating with wireless communications. However, the fundamental stability analysis of the WHMC systems remains an unexplored challenge due to the intricate interplay between the stochastic nature of wireless communications, dynamic human operations, and the inherent complexities of control system dynamics. This paper establishes a fundamental WHMC model incorporating dual wireless loops for machine and human control. Our framework accounts for practical factors such as short-packet transmissions, fading channels, and advanced HARQ schemes. We model human control lag as a Markov process, which is crucial for capturing the stochastic nature of human interactions. Building on this model, we propose a stochastic cycle-cost-based approach to derive a stability condition for the WHMC system, expressed in terms of wireless channel statistics, human dynamics, and control parameters. Our findings are validated through extensive numerical simulations and a proof-of-concept experiment, where we developed and tested a novel wireless collaborative cart-pole control system. The results confirm the effectiveness of our approach and provide a robust framework for future research on WHMC systems in more complex environments. HE Fourth Industrial Revolution, known as Industry 4.0, envisions significantly increased automation and mechanization in manufacturing, driven by rapidly advancing cyber-physical systems (CPS) with minimal human intervention on the factory floor [1]. However, many dynamically changing and unforeseen control tasks in manufacturing, such as recon-figuring the production line, are challenging for autonomous machines to handle alone [2]. Therefore, humans are rein-troduced to the manufacturing process to collaborate with machines in the fifth industrial revolution, Industry 5.0 [3]. In the Industry 5.0 era, human-machine collaboration (HMC) emerges as a key enabling technology to boost productivity, efficiency, and sustainability by combining human's creativity, The work of W . Liu was supported by the Australian Research Council's Discovery Early Career Researcher A ward (DECRA) Project DE230100016.


Acoustic Model Optimization over Multiple Data Sources: Merging and Valuation

arXiv.org Artificial Intelligence

Due to the rising awareness of privacy protection and the voluminous scale of speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers to train the acoustic model with complete data as before. For example, the data may be owned by different curators, and it is not allowed to share with others. In this paper, we propose a novel paradigm to solve salient problems plaguing the ASR field. In the first stage, multiple acoustic models are trained based upon different subsets of the complete speech data, while in the second phase, two novel algorithms are utilized to generate a high-quality acoustic model based upon those trained on data subsets. We first propose the Genetic Merge Algorithm (GMA), which is a highly specialized algorithm for optimizing acoustic models but suffers from low efficiency. We further propose the SGD-Based Optimizational Merge Algorithm (SOMA), which effectively alleviates the efficiency bottleneck of GMA and maintains superior model accuracy. Extensive experiments on public data show that the proposed methods can significantly outperform the state-of-the-art. Furthermore, we introduce Shapley Value to estimate the contribution score of the trained models, which is useful for evaluating the effectiveness of the data and providing fair incentives to their curators.


How to Find the Exact Pareto Front for Multi-Objective MDPs?

arXiv.org Machine Learning

Multi-objective Markov Decision Processes (MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP. The Pareto front identifies the set of policies that cannot be dominated, providing a foundation for finding optimal solutions that can efficiently adapt to various preferences. However, finding the Pareto front is a highly challenging problem. Most existing methods either (i) rely on traversing the continuous preference space, which is impractical and results in approximations that are difficult to evaluate against the true Pareto front, or (ii) focus solely on deterministic Pareto optimal policies, from which there are no known techniques to characterize the full Pareto front. Moreover, finding the structure of the Pareto front itself remains unclear even in the context of dynamic programming. This work addresses the challenge of efficiently discovering the Pareto front. By investigating the geometric structure of the Pareto front in MO-MDP, we uncover a key property: the Pareto front is on the boundary of a convex polytope whose vertices all correspond to deterministic policies, and neighboring vertices of the Pareto front differ by only one state-action pair of the deterministic policy, almost surely. This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair, drastically reducing the complexity of searching for the exact Pareto front. We develop an efficient algorithm that identifies the vertices of the Pareto front by solving a single-objective MDP only once and then traversing the edges of the Pareto front, making it more efficient than existing methods. Our empirical studies demonstrate the effectiveness of our theoretical strategy in discovering the Pareto front.