AITopics

2409.05873

Country: North America > United States (0.28)

Genre:

Workflow (0.93)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Oil & Gas (1.00)
Materials > Chemicals (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Côté, Benjamin, Cossette, Hélène, Marceau, Etienne

Tree-structured Markov random fields with Poisson marginal distributions

arXiv.org Machine LearningAug-24-2024

Having graphs underlie multivariate distributions performs a translation of their vast range of topologies to a rich variety of dependence schemes; this is the premise of probabilistic graphical models. The graph, through its edges and its vertices, serves as a representation of the dependence relations knitting the random variables to one another. Among others, [Koller and Friedman, 2009] and [Maathuis et al., 2018] dive deeply into probabilistic graphical models, with much emphasis on Bayesian networks and Markov random fields (MRFs), also called Markov networks or undirected graphical models. In [Besag, 1974], a seminal paper on MRFs, the author defines various families through their conditional distributions: the distribution of a random variable is specified given the value taken by its neighbours according to the underlying graph. A family that has drawn particular attention to represent a vector of count random variables is Besag's auto-Poisson MRFs, also called Poisson graphical models, where each vertex's conditional distribution is Poisson, the values taken by the neighbouring vertices' random variables influencing its mean parameter.

mrf, random variable, vertex, (16 more...)

arXiv.org Machine Learning

2408.13649

Country:

North America > Canada > Quebec (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceAug-24-2024

Rethinking State Disentanglement in Causal Reinforcement Learning

Cao, Haiyao, Zhang, Zhen, Cai, Panpan, Liu, Yuhang, Zou, Jinan, Abbasnejad, Ehsan, Huang, Biwei, Gong, Mingming, Hengel, Anton van den, Shi, Javen Qinfeng

One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of algorithms. However, these results are often derived from a purely causal viewpoint, which may overlook the specific RL context. We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states. More importantly, removing these assumptions allows algorithm design to go beyond the earlier boundaries constrained by them. Leveraging these insights, we propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation. With the two constraints, the proposed algorithm is guaranteed to disentangle state and noise that is faithful to the underlying dynamics. Empirical evidence from extensive benchmark control tasks demonstrates the superiority of our approach over existing counterparts in effectively disentangling state belief from noise.

latent state, noise, state belief, (17 more...)

2408.13498

Country:

Oceania > Australia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Mastering the Digital Art of War: Developing Intelligent Combat Simulation Agents for Wargaming Using Hierarchical Reinforcement Learning

Black, Scotty

general video game, hierarchical reinforcement learning training framework, relevant information, (14 more...)

In today's rapidly evolving military landscape, advancing artificial intelligence (AI) in support of wargaming becomes essential. Despite reinforcement learning (RL) showing promise for developing intelligent agents, conventional RL faces limitations in handling the complexity inherent in combat simulations. This dissertation proposes a comprehensive approach, including targeted observation abstractions, multi-model integration, a hybrid AI framework, and an overarching hierarchical reinforcement learning (HRL) framework. Our localized observation abstraction using piecewise linear spatial decay simplifies the RL problem, enhancing computational efficiency and demonstrating superior efficacy over traditional global observation methods. Our multi-model framework combines various AI methodologies, optimizing performance while still enabling the use of diverse, specialized individual behavior models. Our hybrid AI framework synergizes RL with scripted agents, leveraging RL for high-level decisions and scripted agents for lower-level tasks, enhancing adaptability, reliability, and performance. Our HRL architecture and training framework decomposes complex problems into manageable subproblems, aligning with military decision-making structures. Although initial tests did not show improved performance, insights were gained to improve future iterations. This study underscores AI's potential to revolutionize wargaming, emphasizing the need for continued research in this domain.

2408.13333

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Bakirtzis, Georgios, Savvas, Michail, Zhao, Ruihan, Chinchali, Sandeep, Topcu, Ufuk

Reduce, Reuse, Recycle: Categories for Compositional Reinforcement Learning

In reinforcement learning, conducting task composition by forming cohesive, executable sequences from multiple tasks remains challenging. However, the ability to (de)compose tasks is a linchpin in developing robotic systems capable of learning complex behaviors. Yet, compositional reinforcement learning is beset with difficulties, including the high dimensionality of the problem space, scarcity of rewards, and absence of system robustness after task composition. To surmount these challenges, we view task composition through the prism of category theory -- a mathematical discipline exploring structures and their compositional relationships. The categorical properties of Markov decision processes untangle complex tasks into manageable sub-tasks, allowing for strategical reduction of dimensionality, facilitating more tractable reward structures, and bolstering system robustness. Experimental results support the categorical theory of reinforcement learning by enabling skill reduction, reuse, and recycling when learning complex robotic arm tasks.

diagram, mdp, subprocess, (15 more...)

2408.13376

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Iowa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Zeng, Sihan, Doan, Thinh T., Romberg, Justin

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying MDPs controlled by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" than the latter. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, PL condition, and general non-convexity. We apply our framework to various policy optimization problems. First, we look at the infinite-horizon average-reward MDP with finite state and action spaces and derive a convergence rate of $O(k^{-2/5})$ for the online actor-critic algorithm under function approximation, which recovers the best known rate derived specifically for this problem. Second, we study the linear-quadratic regulator and show that an online actor-critic method converges with rate $O(k^{-2/3})$. Third, we use the actor-critic algorithm to solve the policy optimization problem in an entropy regularized Markov decision process, where we also establish a convergence of $O(k^{-2/3})$. The results we derive for both the second and third problem are novel and previously unknown in the literature. Finally, we briefly present the application of our framework to gradient-based policy evaluation algorithms in reinforcement learning.

actor-critic algorithm, algorithm, convergence rate, (15 more...)

2109.14756

Country:

North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

arXiv.org Machine LearningAug-23-2024

Accelerated Markov Chain Monte Carlo Using Adaptive Weighting Scheme

Wang, Yanbo, Chen, Wenyu, Shan, Shimin

Gibbs sampling is one of the most commonly used Markov Chain Monte Carlo (MCMC) algorithms due to its simplicity and efficiency. It cycles through the latent variables, sampling each one from its distribution conditional on the current values of all the other variables. Conventional Gibbs sampling is based on the systematic scan (with a deterministic order of variables). In contrast, in recent years, Gibbs sampling with random scan has shown its advantage in some scenarios. However, almost all the analyses of Gibbs sampling with the random scan are based on uniform selection of variables. In this paper, we focus on a random scan Gibbs sampling method that selects each latent variable non-uniformly. Firstly, we show that this non-uniform scan Gibbs sampling leaves the target posterior distribution invariant. Then we explore how to determine the selection probability for latent variables. In particular, we construct an objective as a function of the selection probability and solve the constrained optimization problem. We further derive an analytic solution of the selection probability, which can be estimated easily. Our algorithm relies on the simple intuition that choosing the variable updates according to their marginal probabilities enhances the mixing time of the Markov chain. Finally, we validate the effectiveness of the proposed Gibbs sampler by conducting a set of experiments on real-world applications.

gibbs, gibbs sampler, iteration, (13 more...)

arXiv.org Machine Learning

2408.12888

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning

Oh, Jihwan, Kim, Sungnyun, Kim, Gahee, Kim, Sunghwan, Yun, Se-Young

Offline multi-agent reinforcement learning (MARL) is increasingly recognized as crucial for effectively deploying RL algorithms in environments where real-time interaction is impractical, risky, or costly. In the offline setting, learning from a static dataset of past interactions allows for the development of robust and safe policies without the need for live data collection, which can be fraught with challenges. Building on this foundational importance, we present EAQ, Episodes Augmentation guided by Q-total loss, a novel approach for offline MARL framework utilizing diffusion models. EAQ integrates the Q-total function directly into the diffusion model as a guidance to maximize the global returns in an episode, eliminating the need for separate training. Our focus primarily lies on cooperative scenarios, where agents are required to act collectively towards achieving a shared goal-essentially, maximizing global returns. Consequently, we demonstrate that our episodes augmentation in a collaborative manner significantly boosts offline MARL algorithm compared to the original dataset, improving the normalized return by +17.3% and +12.9% for medium and poor behavioral policies in SMAC simulator, respectively.

agent, dataset, diffusion model, (13 more...)

2408.13092

Country:

Europe > Austria > Vienna (0.14)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

Haider, Adnan, Na, Xingyu, McDermott, Erik, Ng, Tim, Huang, Zhen, Zhuang, Xiaodan

ctc-trained automatic speech recognition model, focused discriminative training

This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and attention-based encoder-decoder (AED) loss. The proposed approach presents a novel framework to identify and improve a model's recognition on challenging segments of an audio. Notably, this training framework is independent of hidden Markov models (HMMs) and lattices, eliminating the need for substantial decision-making regarding HMM topology, lexicon, and graph generation, as typically required in standard discriminative training approaches. Compared to additional fine-tuning with MMI or MWER loss on the encoder, FDT is shown to be more effective in achieving greater reductions in Word Error Rate (WER) on streaming models trained on LibriSpeech. Additionally, this method is shown to be effective in further improving a converged word-piece streaming E2E model trained on 600k hours of assistant and dictation dataset.

2408.13008

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Peralez, Johan, Delage, Aurélien, Castellini, Jacopo, Cunha, Rafael F., Dibangoye, Jilles S.

Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to epsilon-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue. This paper presents a novel and more scalable alternative, namely sequential-move centralized training for decentralized execution. This paradigm further pushes the applicability of Bellman's principle of optimality, raising three new properties. First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones. Next, it proves that epsilon-optimal value functions are piecewise linear and convex in sufficient sequential-move statistics. Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons. Besides, it makes it easy to use single-agent methods, e.g., SARSA algorithm enhanced with these findings applies while still preserving convergence guarantees. Experiments on two- as well as many-agent domains from the literature against epsilon-optimal simultaneous-move solvers confirm the superiority of the novel approach. This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.

agent, algorithm, decision rule, (14 more...)

2408.13139

Country:

Europe > France (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.54)
Overview > Innovation (0.54)

Industry:

Government (0.50)
Banking & Finance (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)