Goto

Collaborating Authors

 Markov Models


Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

arXiv.org Artificial Intelligence

We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problemspecific exponent. We provide a lower bound matching our upper bound up to a polynomial term. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this particular gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropyregularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.


Tree-based variational inference for Poisson log-normal models

arXiv.org Machine Learning

When studying ecosystems, hierarchical trees are often used to organize entities based on proximity criteria, such as the taxonomy in microbiology, social classes in geography, or product types in retail businesses, offering valuable insights into entity relationships. Despite their significance, current count-data models do not leverage this structured information. In particular, the widely used Poisson log-normal (PLN) model, known for its ability to model interactions between entities from count data, lacks the possibility to incorporate such hierarchical tree structures, limiting its applicability in domains characterized by such complexities. To address this matter, we introduce the PLN-Tree model as an extension of the PLN model, specifically designed for modeling hierarchical count data. By integrating structured variational inference techniques, we propose an adapted training procedure and establish identifiability results, enhancisng both theoretical foundations and practical interpretability. Additionally, we extend our framework to classification tasks as a preprocessing pipeline, showcasing its versatility. Experimental evaluations on synthetic datasets as well as real-world microbiome data demonstrate the superior performance of the PLN-Tree model in capturing hierarchical dependencies and providing valuable insights into complex data structures, showing the practical interest of knowledge graphs like the taxonomy in ecosystems modeling.


Identifying Nonstationary Causal Structures with High-Order Markov Switching Models

arXiv.org Machine Learning

Causal discovery in time series is a rapidly evolving field with a wide variety of applications in other areas such as climate science and neuroscience. Traditional approaches assume a stationary causal graph, which can be adapted to nonstationary time series with time-dependent effects or heterogeneous noise. In this work we address nonstationarity via regime-dependent causal structures. We first establish identifiability for high-order Markov Switching Models, which provide the foundations for identifiable regime-dependent causal discovery. Our empirical studies demonstrate the scalability of our proposed approach for high-order regime-dependent structure estimation, and we illustrate its applicability on brain activity data.


Reinforcement Learning via Auxiliary Task Distillation

arXiv.org Artificial Intelligence

We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation loss transfers behaviors from these auxiliary tasks to solve the main task. We demonstrate that AuxDistill can learn a pixels-to-actions policy for a challenging multi-stage embodied object rearrangement task from the environment reward without demonstrations, a learning curriculum, or pre-trained skills. AuxDistill achieves $2.3 \times$ higher success than the previous state-of-the-art baseline in the Habitat Object Rearrangement benchmark and outperforms methods that use pre-trained skills and expert demonstrations.


The MRI Scanner as a Diagnostic: Image-less Active Sampling

arXiv.org Artificial Intelligence

Despite the high diagnostic accuracy of Magnetic Resonance Imaging (MRI), using MRI as a Point-of-Care (POC) disease identification tool poses significant accessibility challenges due to the use of high magnetic field strength and lengthy acquisition times. We ask a simple question: Can we dynamically optimise acquired samples, at the patient level, according to an (automated) downstream decision task, while discounting image reconstruction? We propose an ML-based framework that learns an active sampling strategy, via reinforcement learning, at a patient-level to directly infer disease from undersampled k-space. We validate our approach by inferring Meniscus Tear in undersampled knee MRI data, where we achieve diagnostic performance comparable with ML-based diagnosis, using fully sampled k-space data. We analyse task-specific sampling policies, showcasing the adaptability of our active sampling approach. The introduced frugal sampling strategies have the potential to reduce high field strength requirements that in turn strengthen the viability of MRI-based POC disease identification and associated preliminary screening tools.


Hierarchical Framework for Optimizing Wildfire Surveillance and Suppression using Human-Autonomous Teaming

arXiv.org Artificial Intelligence

The integration of manned and unmanned aircraft can help improve wildfire response. Wildfire containment failures occur when resources available to first responders, who execute the initial stages of wildfire management referred to as the initial attack, are ineffective or insufficient. Initial attack surveillance and suppression models have linked action spaces and objectives, making their optimization computationally challenging. The initial attack may be formulated as a multi-agent partially observable Markov decision process (MPOMDP). We divide the initial attack MPOMDP into surveillance and suppression processes with their respective planners operating on different, but constant, time scales. A hierarchical framework iterates between surveillance and suppression planners while also providing collision avoidance. This framework is exemplified by a set of multi-rotor unmanned aircraft surveying an initial attack fire while a manned helicopter suppresses the fire with a water bucket. Wildfire-specific solver extensions are formulated to reduce the otherwise vast action spaces. The hierarchical framework outperforms firefighting techniques and a myopic baseline by up to 242% for moderate wildfires and 60% for rapid wildfires when simulated in abstracted and actual case studies. We also validate the early dispatching of additional suppression assets using regression models to ensure wildfire containment to thresholds established by wildfire agencies.


Addressing Polarization and Unfairness in Performative Prediction

arXiv.org Artificial Intelligence

When machine learning (ML) models are used in applications that involve humans (e.g., online recommendation, school admission, hiring, lending), the model itself may trigger changes in the distribution of targeted data it aims to predict. Performative prediction (PP) is a framework that explicitly considers such model-dependent distribution shifts when learning ML models. While significant efforts have been devoted to finding performative stable (PS) solutions in PP for system robustness, their societal implications are less explored and it is unclear whether PS solutions are aligned with social norms such as fairness. In this paper, we set out to examine the fairness property of PS solutions in performative prediction. We first show that PS solutions can incur severe polarization effects and group-wise loss disparity. Although existing fairness mechanisms commonly used in literature can help mitigate unfairness, they may fail and disrupt the stability under model-dependent distribution shifts. We thus propose novel fairness intervention mechanisms that can simultaneously achieve both stability and fairness in PP settings. Both theoretical analysis and experiments are provided to validate the proposed method.


Bayesian temporal biclustering with applications to multi-subject neuroscience studies

arXiv.org Artificial Intelligence

We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of subjects induces a time-varying partition of measurements. Our approach allows for data-driven determination of the number of subject and measurement clusters as well as estimation of the number and location of changepoints in measurement partitions. To efficiently perform model fitting and posterior estimation with Markov Chain Monte Carlo, we derive a blocked update of measurements' cluster-assignment sequences. We illustrate the performance of our model in two applications to functional magnetic resonance imaging data and to an electroencephalogram dataset. The results indicate that the proposed model can combine information from potentially many subjects to discover a set of interpretable, dynamic patterns. Experiments on simulated data compare the estimation performance of the proposed model against ground-truth values and other statistical methods, showing that it performs well at identifying ground-truth subject and measurement clusters even when no subject or time dependence is present.


Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes

arXiv.org Artificial Intelligence

In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this structure, we can obtain a reduction in the causal representation of the problem setting, allowing us to solve of the optimal value function more efficiently. This work defines an MDP framework, the \texttt{SD-MDP}, where we disentangle the causal structure of MDPs' transition and reward dynamics, providing distinct partitions on the temporal causal graph. With this stochastic reduction, the \texttt{SD-MDP} reflects a general class of resource allocation problems. This disentanglement further enables us to derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling. Subsequently, by integrating this estimator into well-known Monte Carlo planning algorithms, such as Monte Carlo Tree Search (MCTS), we derive bounds on the simple regret of the algorithm. Finally, we quantify the policy improvement of MCTS under the \texttt{SD-MDP} framework by demonstrating that the MCTS planning algorithm achieves higher expected reward (lower costs) under a constant simulation budget, on a tangible economic example based on maritime refuelling.


LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments

arXiv.org Artificial Intelligence

Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds. Compared with previous LLM-based testbeds, LangSuitE (i) offers adaptability to diverse environments without multiple simulation engines, (ii) evaluates agents' capacity to develop ``internalized world knowledge'' with embodied observations, and (iii) allows easy customization of communication and action strategies. To address the embodiment challenge, we devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information. Comprehensive benchmark results illustrate challenges and insights of embodied planning. LangSuitE represents a significant step toward building embodied generalists in the context of language models.