Markov Models
Multi-Action Restless Bandits with Weakly Coupled Constraints: Simultaneous Learning and Control
Fu, Jing, Moran, Bill, Niño-Mora, José
We study a system with finitely many groups of multi-action bandit processes, each of which is a Markov decision process (MDP) with finite state and action spaces and potentially different transition matrices when taking different actions. The bandit processes of the same group share the same state and action spaces and, given the same action that is taken, the same transition matrix. All the bandit processes across various groups are subject to multiple weakly coupled constraints over their state and action variables. Unlike the past studies that focused on the offline case, we consider the online case without assuming full knowledge of transition matrices and reward functions a priori and propose an effective scheme that enables simultaneous learning and control. We prove the convergence of the relevant processes in both the timeline and the number of the bandit processes, referred to as the convergence in the time and the magnitude dimensions. Moreover, we prove that the relevant processes converge exponentially fast in the magnitude dimension, leading to exponentially diminishing performance deviation between the proposed online algorithms and offline optimality. Jing Fu is with Department of Electrical and Electronic Engineering, School of Engineering, STEM College, RMIT University, Australia (e-mail: jing.fu@rmit.edu.au). Bill Moran is with Department of Electrical and Electronic Engineering, the University of Melbourne, VIC 3010, Australia (e-mail:wmoran@unimelb.edu.au).
Projection Abstractions in Planning Under the Lenses of Abstractions for MDPs
Canonaco, Giuseppe, Pozanco, Alberto, Borrajo, Daniel
The concept of abstraction has been independently developed both in the context of AI Planning and discounted Markov Decision Processes (MDPs). However, the way abstractions are built and used in the context of Planning and MDPs is different even though lots of commonalities can be highlighted. To this day there is no work trying to relate and unify the two fields on the matter of abstractions unraveling all the different assumptions and their effect on the way they can be used. Therefore, in this paper we aim to do so by looking at projection abstractions in Planning through the lenses of discounted MDPs. Starting from a projection abstraction built according to Classical or Probabilistic Planning techniques, we will show how the same abstraction can be obtained under the abstraction frameworks available for discounted MDPs. Along the way, we will focus on computational as well as representational advantages and disadvantages of both worlds pointing out new research directions that are of interest for both fields.
The effect of priors on Learning with Restricted Boltzmann Machines
Manzan, Gianluca, Tantari, Daniele
Restricted Boltzmann Machines (RBMs) are generative models designed to learn from data with a rich underlying structure. In this work, we explore a teacher-student setting where a student RBM learns from examples generated by a teacher RBM, with a focus on the effect of the unit priors on learning efficiency. We consider a parametric class of priors that interpolate between continuous (Gaussian) and binary variables. This approach models various possible choices of visible units, hidden units, and weights for both the teacher and student RBMs. By analyzing the phase diagram of the posterior distribution in both the Bayes optimal and mismatched regimes, we demonstrate the existence of a triple point that defines the critical dataset size necessary for learning through generalization. The critical size is strongly influenced by the properties of the teacher, and thus the data, but is unaffected by the properties of the student RBM. Nevertheless, a prudent choice of student priors can facilitate training by expanding the so-called signal retrieval region, where the machine generalizes effectively.
Reinforcement learning to learn quantum states for Heisenberg scaling accuracy
Jae, Jeongwoo, Hong, Jeonghoon, Choo, Jinho, Kwon, Yeong-Dae
Learning quantum states is a crucial task for realizing the potential of quantum information technology. Recently, neural approaches have emerged as promising methods for learning quantum states. We propose a meta-learning model that employs reinforcement learning (RL) to optimize the process of learning quantum states. For learning quantum states, our scheme trains a Hardware efficient ansatz with a blackbox optimization algorithm, called evolution strategy (ES). To enhance the efficiency of ES, a RL agent dynamically adjusts the hyperparameters of ES. To facilitate the RL training, we introduce an action repetition strategy inspired by curriculum learning. The RL agent significantly improves the sample efficiency of learning random quantum states, and achieves infidelity scaling close to the Heisenberg limit. We showcase that the RL agent trained using 3-qubit states can be generalized to learning up to 5-qubit states. These results highlight the utility of RL-driven meta-learning to enhance the efficiency and generalizability of learning quantum states. Our approach can be applicable to improve quantum control, quantum optimization, and quantum machine learning.
TAB-Fields: A Maximum Entropy Framework for Mission-Aware Adversarial Planning
Puthumanaillam, Gokul, Song, Jae Hyuk, Yesmagambet, Nurzhan, Park, Shinkyu, Ornik, Melkior
Autonomous agents operating in adversarial scenarios face a fundamental challenge: while they may know their adversaries' high-level objectives, such as reaching specific destinations within time constraints, the exact policies these adversaries will employ remain unknown. Traditional approaches address this challenge by treating the adversary's state as a partially observable element, leading to a formulation as a Partially Observable Markov Decision Process (POMDP). However, the induced belief-space dynamics in a POMDP require knowledge of the system's transition dynamics, which, in this case, depend on the adversary's unknown policy. Our key observation is that while an adversary's exact policy is unknown, their behavior is necessarily constrained by their mission objectives and the physical environment, allowing us to characterize the space of possible behaviors without assuming specific policies. In this paper, we develop Task-Aware Behavior Fields (TAB-Fields), a representation that captures adversary state distributions over time by computing the most unbiased probability distribution consistent with known constraints. We construct TAB-Fields by solving a constrained optimization problem that minimizes additional assumptions about adversary behavior beyond mission and environmental requirements. We integrate TAB-Fields with standard planning algorithms by introducing TAB-conditioned POMCP, an adaptation of Partially Observable Monte Carlo Planning. Through experiments in simulation with underwater robots and hardware implementations with ground robots, we demonstrate that our approach achieves superior performance compared to baselines that either assume specific adversary policies or neglect mission constraints altogether. Evaluation videos and code are available at https://tab-fields.github.io.
Step-by-Step Guidance to Differential Anemia Diagnosis with Real-World Data and Deep Reinforcement Learning
Muyama, Lillian, Lu, Estelle, Cheminet, Geoffrey, Pouchot, Jacques, Rance, Bastien, Tropeano, Anne-Isabelle, Neuraz, Antoine, Coulet, Adrien
Clinical diagnostic guidelines outline the key questions to answer to reach a diagnosis. Inspired by guidelines, we aim to develop a model that learns from electronic health records to determine the optimal sequence of actions for accurate diagnosis. Focusing on anemia and its sub-types, we employ deep reinforcement learning (DRL) algorithms and evaluate their performance on both a synthetic dataset, which is based on expert-defined diagnostic pathways, and a real-world dataset. We investigate the performance of these algorithms across various scenarios. Our experimental results demonstrate that DRL algorithms perform competitively with state-of-the-art methods while offering the significant advantage of progressively generating pathways to the suggested diagnosis, providing a transparent decision-making process that can guide and explain diagnostic reasoning.
A Combinatorial Approach to Neural Emergent Communication
Substantial research on deep learning-based emergent communication uses the referential game framework, specifically the Lewis signaling game, however we argue that successful communication in this game typically only need one or two symbols for target image classification because of a sampling pitfall in the training data. To address this issue, we provide a theoretical analysis and introduce a combinatorial algorithm SolveMinSym (SMS) to solve the symbolic complexity for classification, which is the minimum number of symbols in the message for successful communication. We use the SMS algorithm to create datasets with different symbolic complexity to empirically show that data with higher symbolic complexity increases the number of effective symbols in the emergent language.
A Survey on Deep Neural Networks in Collaborative Filtering Recommendation Systems
Li, Pang, Noah, Shahrul Azman Mohd, Sarim, Hafiz Mohd
This survey provides an examination of the use of Deep Neural Networks (DNN) in Collaborative Filtering (CF) recommendation systems. As the digital world increasingly relies on data-driven approaches, traditional CF techniques face limitations in scalability and flexibility. DNNs can address these challenges by effectively modeling complex, non-linear relationships within the data. We begin by exploring the fundamental principles of both collaborative filtering and deep neural networks, laying the groundwork for understanding their integration. Subsequently, we review key advancements in the field, categorizing various deep learning models that enhance CF systems, including Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Graph Neural Networks (GNN), autoencoders, Generative Adversarial Networks (GAN), and Restricted Boltzmann Machines (RBM). The paper also discusses evaluation protocols, various publicly available auxiliary information, and data features. Furthermore, the survey concludes with a discussion of the challenges and future research opportunities in enhancing collaborative filtering systems with deep learning.
Real-Time Multilingual Sign Language Processing
Sign Language Processing (SLP) is an interdisciplinary field comprised of Natural Language Processing (NLP) and Computer Vision. It is focused on the computational understanding, translation, and production of signed languages. Traditional approaches have often been constrained by the use of gloss-based systems that are both language-specific and inadequate for capturing the multidimensional nature of sign language. These limitations have hindered the development of technology capable of processing signed languages effectively. This thesis aims to revolutionize the field of SLP by proposing a simple paradigm that can bridge this existing technological gap. We propose the use of SignWiring, a universal sign language transcription notation system, to serve as an intermediary link between the visual-gestural modality of signed languages and text-based linguistic representations. We contribute foundational libraries and resources to the SLP community, thereby setting the stage for a more in-depth exploration of the tasks of sign language translation and production. These tasks encompass the translation of sign language from video to spoken language text and vice versa. Through empirical evaluations, we establish the efficacy of our transcription method as a pivot for enabling faster, more targeted research, that can lead to more natural and accurate translations across a range of languages. The universal nature of our transcription-based paradigm also paves the way for real-time, multilingual applications in SLP, thereby offering a more inclusive and accessible approach to language technology. This is a significant step toward universal accessibility, enabling a wider reach of AI-driven language technologies to include the deaf and hard-of-hearing community.
The Evolution and Future Perspectives of Artificial Intelligence Generated Content
Zhu, Chengzhang, Cui, Luobin, Tang, Ying, Wang, Jiacun
Artificial intelligence generated content (AIGC), a rapidly advancing technology, is transforming content creation across domains, such as text, images, audio, and video. Its growing potential has attracted more and more researchers and investors to explore and expand its possibilities. This review traces AIGC's evolution through four developmental milestones-ranging from early rule-based systems to modern transfer learning models-within a unified framework that highlights how each milestone contributes uniquely to content generation. In particular, the paper employs a common example across all milestones to illustrate the capabilities and limitations of methods within each phase, providing a consistent evaluation of AIGC methodologies and their development. Furthermore, this paper addresses critical challenges associated with AIGC and proposes actionable strategies to mitigate them. This study aims to guide researchers and practitioners in selecting and optimizing AIGC models to enhance the quality and efficiency of content creation across diverse domains.