AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

Bui, The Viet, Nguyen, Thanh Hong, Mai, Tien

arXiv.org Artificial IntelligenceOct-2-2024

Offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets without the need for further environmental interactions. While promising results have been demonstrated in single-agent settings, offline multi-agent reinforcement learning (MARL) presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors. A key issue in offline RL is the distributional shift, which arises when the target policy being optimized deviates from the behavior policy that generated the data. This problem is exacerbated in MARL due to the interdependence between agents' local policies and the expansive joint state-action space. Prior approaches have primarily addressed this challenge by incorporating regularization in the space of either Q-functions or policies. In this work, we introduce a regularizer in the space of stationary distributions to better handle distributional shift. Our algorithm, ComaDICE, offers a principled framework for offline cooperative MARL by incorporating stationary distribution regularization for the global learning policy, complemented by a carefully structured multi-agent value decomposition strategy to facilitate multi-agent training. Through extensive experiments on the multi-agent MuJoCo and StarCraft II benchmarks, we demonstrate that ComaDICE achieves superior performance compared to state-of-the-art offline MARL methods across nearly all tasks. Over the years, deep RL has achieved remarkable success in various decision-making tasks (Levine et al., 2016; Silver et al., 2017; Kalashnikov et al., 2018; Haydari & Yılmaz, 2020). However, a significant limitation of deep RL is its need for millions of interactions with the environment to gather experiences for policy improvement.

algorithm, comadice and baseline, reinforcement learning, (9 more...)

arXiv.org Artificial Intelligence

2410.01954

Country:

North America > United States > Oregon > Lane County > Eugene (0.14)
Asia > Singapore (0.04)
North America > United States > Ohio > Lucas County > Oregon (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Social coordination perpetuates stereotypic expectations and behaviors across generations in deep multi-agent reinforcement learning

Gelpí, Rebekah A., Tang, Yikai, Jackson, Ethan C., Cunningham, William A.

arXiv.org Artificial IntelligenceOct-2-2024

Despite often being perceived as morally objectionable, stereotypes are a common feature of social groups, a phenomenon that has often been attributed to biased motivations or limits on the ability to process information. We argue that one reason for this continued prevalence is that pre-existing expectations about how others will behave, in the context of social coordination, can change the behaviors of one's social partners, creating the very stereotype one expected to see, even in the absence of other potential sources of stereotyping. We use a computational model of dynamic social coordination to illustrate how this "feedback loop" can emerge, engendering and entrenching stereotypic behavior, and then show that human behavior on the task generates a comparable feedback loop. Notably, people's choices on the task are not related to social dominance or system justification, suggesting biased motivations are not necessary to maintain these stereotypes.

agent, market decider, participant, (14 more...)

arXiv.org Artificial Intelligence

2410.01763

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Oceania > New Zealand (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Discrete Diffusion Schr\"odinger Bridge Matching for Graph Transformation

Kim, Jun Hyeong, Kim, Seonghwan, Moon, Seokhyun, Kim, Hyeongwoo, Woo, Jeheon, Kim, Woo Youn

arXiv.org Artificial IntelligenceOct-2-2024

Transporting between arbitrary distributions is a fundamental goal in generative modeling. Recently proposed diffusion bridge models provide a potential solution, but they rely on a joint distribution that is difficult to obtain in practice. Furthermore, formulations based on continuous domains limit their applicability to discrete domains such as graphs. To overcome these limitations, we propose Discrete Diffusion Schrödinger Bridge Matching (DDSBM), a novel framework that utilizes continuous-time Markov chains to solve the SB problem in a highdimensional discrete state space. Our approach extends Iterative Markovian Fitting to discrete domains, and we have proved its convergence to the SB. Furthermore, we adapt our framework for the graph transformation and show that our design choice of underlying dynamics characterized by independent modifications of nodes and edges can be interpreted as the entropy-regularized version of optimal transport with a cost function described by the graph edit distance. To demonstrate the effectiveness of our framework, we have applied DDSBM to molecular optimization in the field of chemistry. Experimental results demonstrate that DDSBM effectively optimizes molecules' property-of-interest with minimal graph transformation, successfully retaining other features. Transporting an initial distribution to a target distribution is a foundational concept in modern generative modeling. Denoising diffusion models (DDMs) have been highly influential in this area, with a primary focus on generating data distributions from simple prior (Sohl-Dickstein et al., 2015; Song & Ermon, 2019; Ho et al., 2020; Song et al., 2020; Kim et al., 2024b). Despite their promising results, setting the initial distribution as a simple prior makes DDMs hard to work in tasks where the initial distribution becomes a data distribution, such as image-to-image translation. To tackle this, diffusion bridge models (DBMs) extend DDMs to transport data between arbitrary distributions (Liu & Wu, 2023; Liu et al., 2023; Zhou et al., 2023).

generator, graph, molecule, (16 more...)

arXiv.org Artificial Intelligence

2410.015

Country:

Asia > South Korea > Daejeon > Daejeon (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation

Chaudhari, Shreyas, Deshpande, Ameet, da Silva, Bruno Castro, Thomas, Philip S.

arXiv.org Machine LearningOct-2-2024

Evaluating policies using off-policy data is crucial for applying reinforcement learning to real-world problems such as healthcare and autonomous driving. Previous methods for off-policy evaluation (OPE) generally suffer from high variance or irreducible bias, leading to unacceptably high prediction errors. In this work, we introduce STAR, a framework for OPE that encompasses a broad range of estimators -- which include existing OPE methods as special cases -- that achieve lower mean squared prediction errors. STAR leverages state abstraction to distill complex, potentially continuous problems into compact, discrete models which we call abstract reward processes (ARPs). Predictions from ARPs estimated from off-policy data are provably consistent (asymptotically correct). Rather than proposing a specific estimator, we present a new framework for OPE and empirically demonstrate that estimators within STAR outperform existing methods. The best STAR estimator outperforms baselines in all twelve cases studied, and even the median STAR estimator surpasses the baselines in seven out of the twelve cases.

abstract state, arp, evaluation, (15 more...)

arXiv.org Machine Learning

2410.02172

Country:

North America > United States > New Jersey (0.04)
North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Inferring Kernel $\epsilon$-Machines: Discovering Structure in Complex Systems

Jurgens, Alexandra M., Brodu, Nicolas

arXiv.org Artificial IntelligenceOct-1-2024

Previously, we showed that computational mechanic's causal states -- predictively-equivalent trajectory classes for a stochastic dynamical system -- can be cast into a reproducing kernel Hilbert space. The result is a widely-applicable method that infers causal structure directly from very different kinds of observations and systems. Here, we expand this method to explicitly introduce the causal diffusion components it produces. These encode the kernel causal-state estimates as a set of coordinates in a reduced dimension space. We show how each component extracts predictive features from data and demonstrate their application on four examples: first, a simple pendulum -- an exactly solvable system; second, a molecular-dynamic trajectory of $n$-butane -- a high-dimensional system with a well-studied energy landscape; third, the monthly sunspot sequence -- the longest-running available time series of direct observations; and fourth, multi-year observations of an active crop field -- a set of heterogeneous observations of the same ecosystem taken for over a decade. In this way, we demonstrate that the empirical kernel causal-states algorithm robustly discovers predictive structures for systems with widely varying dimensionality and stochasticity.

artificial intelligence, causal state, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.01076

Country:

North America > United States > Michigan (0.14)
North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Downstream (0.35)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

Recovering Time-Varying Networks From Single-Cell Data

Hasanaj, Euxhen, Póczos, Barnabás, Bar-Joseph, Ziv

arXiv.org Artificial IntelligenceOct-1-2024

Gene regulation is a dynamic process that underlies all aspects of human development, disease response, and other key biological processes. The reconstruction of temporal gene regulatory networks has conventionally relied on regression analysis, graphical models, or other types of relevance networks. With the large increase in time series single-cell data, new approaches are needed to address the unique scale and nature of this data for reconstructing such networks. Here, we develop a deep neural network, Marlene, to infer dynamic graphs from time series single-cell gene expression data. Marlene constructs directed gene networks using a self-attention mechanism where the weights evolve over time using recurrent units. By employing meta learning, the model is able to recover accurate temporal networks even for rare cell types. In addition, Marlene can identify gene interactions relevant to specific biological responses, including COVID-19 immune response, fibrosis, and aging. Biological systems are dynamic, changing over time in response to various stimuli and events. To construct accurate models of biological activity during development, disease progression, treatment response, and other biological processes, it is essential to track their evolution over time (Bar-Joseph et al., 2012). Studying the regulation of these dynamic processes is key for understanding the underlying mechanisms that drive the response and for identifying potential interventions that can serve as cures for diseases (Silverman et al., 2020). Much of the research in this area is focused on the reconstruction of regulatory networks (Karlebach & Shamir, 2008; Badia-I-Mompel et al., 2023).

cell type, marlene, regulatory network, (16 more...)

arXiv.org Artificial Intelligence

2410.01853

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Find Everything: A General Vision Language Model Approach to Multi-Object Search

Choi, Daniel, Fung, Angus, Wang, Haitong, Tan, Aaron Hao

arXiv.org Artificial IntelligenceOct-1-2024

In various real-world robot applications, MOS describes the problem of locating multiple objects efficiently [1], in domains such as warehouse management [2, 3], construction inspection [4], or hospitality [5, 6, 7], and retail assistance [8, 9]. Existing MOS methods can be categorized into: 1) probabilistic planning (PP) [1, 10, 11, 12], and 2) deep reinforcement learning (DRL) methods [13, 14, 15, 16, 17, 18, 19, 20]. PP methods utilize Partially Observable Markov Decision Processes (POMDPs) to estimate belief states and plan actions under uncertainty in object locations, while DRL methods optimizes action selection using a reward function [21]. However, both approaches face challenges such as inefficient exploration due to limited semantic modeling between objects and scenes [18], and poor generlization caused by the sim-to-real gap [19]. Recently, Large Foundation Models (LFMs) such as vision-language models (VLMs) and large language models (LLMs) have been applied to single object search (SOS) tasks by using either: 1) VLMs (e.g., CLIP, BLIP, etc.) to generate scene-level embeddings that capture the semantic correlations between the robot's environment and the target object to guide the robot towards regions with high target object likelihood [19, 22, 23, 24, 25]; or, 2) VLMs/LLMs to generate scene captions that describe both the spatial layout and semantic details of the robot's environment which are then used to plan the robot's actions [26, 27, 28, 29, 30, 31, 32]. However, these SOS methods have limitations: 1) they cannot be directly applied to MOS, as they lack explicit mechanisms to track and reason about multiple objects simultaneously, and 2) scene-level embeddings are often noisy and coarse [33], which cannot be effectively applied in object-dense environments. In such cases, fine-grained, object-level embeddings are needed. In this paper, we introduce Finder, the first MOS approach that leverages VLMs to locate multiple target objects in various unknown environments.

finder, navigation, score map, (11 more...)

arXiv.org Artificial Intelligence

2410.00388

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland (0.04)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Entropy contraction of the Gibbs sampler under log-concavity

Ascolani, Filippo, Lavenant, Hugo, Zanella, Giacomo

arXiv.org Machine LearningOct-1-2024

The Gibbs sampler (a.k.a. Glauber dynamics and heat-bath algorithm) is a popular Markov Chain Monte Carlo algorithm which iteratively samples from the conditional distributions of a probability measure $\pi$ of interest. Under the assumption that $\pi$ is strongly log-concave, we show that the random scan Gibbs sampler contracts in relative entropy and provide a sharp characterization of the associated contraction rate. Assuming that evaluating conditionals is cheap compared to evaluating the joint density, our results imply that the number of full evaluations of $\pi$ needed for the Gibbs sampler to mix grows linearly with the condition number and is independent of the dimension. If $\pi$ is non-strongly log-concave, the convergence rate in entropy degrades from exponential to polynomial. Our techniques are versatile and extend to Metropolis-within-Gibbs schemes and the Hit-and-Run algorithm. A comparison with gradient-based schemes and the connection with the optimization literature are also discussed.

assumption, pxq, theorem 3, (15 more...)

arXiv.org Machine Learning

2410.00858

Country:

Europe > Italy > Lombardy > Milan (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Investigating the Impact of Model Complexity in Large Language Models

Luo, Jing, Wang, Huiyuan, Huang, Weiran

arXiv.org Machine LearningOct-1-2024

Large Language Models (LLMs) based on the pre-trained fine-tuning paradigm have become pivotal in solving natural language processing tasks, consistently achieving state-of-the-art performance. Nevertheless, the theoretical understanding of how model complexity influences fine-tuning performance remains challenging and has not been well explored yet. In this paper, we focus on autoregressive LLMs and propose to employ Hidden Markov Models (HMMs) to model them. Based on the HMM modeling, we investigate the relationship between model complexity and the generalization capability in downstream tasks. Specifically, we consider a popular tuning paradigm for downstream tasks, head tuning, where all pre-trained parameters are frozen and only individual heads are trained atop pre-trained LLMs. Our theoretical analysis reveals that the risk initially increases and then decreases with rising model complexity, showcasing a "double descent" phenomenon. In this case, the initial "descent" is degenerate, signifying that the "sweet spot" where bias and variance are balanced occurs when the model size is zero. Obtaining the presented in this study conclusion confronts several challenges, primarily revolving around effectively modeling autoregressive LLMs and downstream tasks, as well as conducting a comprehensive risk analysis for multivariate regression. Our research is substantiated by experiments conducted on data generated from HMMs, which provided empirical support and alignment with our theoretical insights.

prediction risk, preprint, regression, (16 more...)

arXiv.org Machine Learning

2410.00699

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Demonstrating the Continual Learning Capabilities and Practical Application of Discrete-Time Active Inference

Prakki, Rithvik

arXiv.org Artificial IntelligenceSep-30-2024

Active inference is a mathematical framework for understanding how agents (biological or artificial) interact with their environments, enabling continual adaptation and decision-making. It combines Bayesian inference and free energy minimization to model perception, action, and learning in uncertain and dynamic contexts. Unlike reinforcement learning, active inference integrates exploration and exploitation seamlessly by minimizing expected free energy. In this paper, we present a continual learning framework for agents operating in discrete time environments, using active inference as the foundation. We derive the mathematical formulations of variational and expected free energy and apply them to the design of a self-learning research agent. This agent updates its beliefs and adapts its actions based on new data without manual intervention. Through experiments in changing environments, we demonstrate the agent's ability to relearn and refine its models efficiently, making it suitable for complex domains like finance and healthcare. The paper concludes by discussing how the proposed framework generalizes to other systems, positioning active inference as a flexible approach for adaptive AI.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.0024

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.89)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)

Add feedback