AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

Zhu, Changxi, Dastani, Mehdi, Wang, Shihan

arXiv.org Artificial IntelligenceFeb-10-2025

In decentralized multi-agent deep reinforcement learning (MADRL), communication can help agents to gain a better understanding of the environment to better coordinate their behaviors. Nevertheless, communication may involve uncertainty, which potentially introduces variance to the learning of decentralized agents. In this paper, we focus on a specific decentralized MADRL setting with communication and conduct a theoretical analysis to study the variance that is caused by communication in policy gradients. We propose modular techniques to reduce the variance in policy gradients during training. We adopt our modular techniques into two existing algorithms for decentralized MADRL with communication and evaluate them on multiple tasks in the StarCraft Multi-Agent Challenge and Traffic Junction domains. The results show that decentralized MADRL communication methods extended with our proposed techniques not only achieve high-performing agents but also reduce variance in policy gradients during training.

communication, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2502.06261

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(10 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Logarithmic Regret of Exploration in Average Reward Markov Decision Processes

Boone, Victor, Gaujal, Bruno

arXiv.org Machine LearningFeb-10-2025

In average reward Markov decision processes, state-of-the-art algorithms for regret minimization follow a well-established framework: They are model-based, optimistic and episodic. First, they maintain a confidence region from which optimistic policies are computed using a well-known subroutine called Extended Value Iteration (EVI). Second, these policies are used over time windows called episodes, each ended by the Doubling Trick (DT) rule or a variant thereof. In this work, without modifying EVI, we show that there is a significant advantage in replacing (DT) by another simple rule, that we call the Vanishing Multiplicative (VM) rule. When managing episodes with (VM), the algorithm's regret is, both in theory and in practice, as good if not better than with (DT), while the one-shot behavior is greatly improved. More specifically, the management of bad episodes (when sub-optimal policies are being used) is much better under (VM) than (DT) by making the regret of exploration logarithmic rather than linear. These results are made possible by a new in-depth understanding of the contrasting behaviors of confidence regions during good and bad episodes.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2502.0648

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)

Genre:

Research Report (0.49)
Workflow (0.47)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Add feedback

Learning an Optimal Assortment Policy under Observational Data

Han, Yuxuan, Zhong, Han, Lu, Miao, Blanchet, Jose, Zhou, Zhengyuan

arXiv.org Machine LearningFeb-10-2025

We study the fundamental problem of offline assortment optimization under the Multinomial Logit (MNL) model, where sellers must determine the optimal subset of the products to offer based solely on historical customer choice data. While most existing approaches to learning-based assortment optimization focus on the online learning of the optimal assortment through repeated interactions with customers, such exploration can be costly or even impractical in many real-world settings. In this paper, we consider the offline learning paradigm and investigate the minimal data requirements for efficient offline assortment optimization. To this end, we introduce Pessimistic Rank-Breaking (PRB), an algorithm that combines rank-breaking with pessimistic estimation. We prove that PRB is nearly minimax optimal by establishing the tight suboptimality upper bound and a nearly matching lower bound. This further shows that "optimal item coverage" - where each item in the optimal assortment appears sufficiently often in the historical data - is both sufficient and necessary for efficient offline learning. This significantly relaxes the previous requirement of observing the complete optimal assortment in the data. Our results provide fundamental insights into the data requirements for offline assortment optimization under the MNL model.

artificial intelligence, assortment, machine learning, (18 more...)

arXiv.org Machine Learning

2502.06777

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Pre-Trained Video Generative Models as World Simulators

He, Haoran, Zhang, Yang, Lin, Liang, Xu, Zhongwen, Pan, Ling

arXiv.org Artificial IntelligenceFeb-10-2025

Video generative models pre-trained on large-scale internet datasets have achieved remarkable success, excelling at producing realistic synthetic videos. However, they often generate clips based on static prompts (e.g., text or images), limiting their ability to model interactive and dynamic scenarios. In this paper, we propose Dynamic World Simulation (DWS), a novel approach to transform pre-trained video generative models into controllable world simulators capable of executing specified action trajectories. To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module that seamlessly integrates into any existing model. Instead of focusing on complex visual details, we demonstrate that consistent dynamic transition modeling is the key to building powerful world simulators. Building upon this insight, we further introduce a motion-reinforced loss that enhances action controllability by compelling the model to capture dynamic changes more effectively. Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models, achieving significant improvements in generating action-controllable, dynamically consistent videos across games and robotics domains. Moreover, to facilitate the applications of the learned world simulator in downstream tasks such as model-based reinforcement learning, we propose prioritized imagination to improve sample efficiency, demonstrating competitive performance compared with state-of-the-art methods.

machine learning, natural language, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.07825

Genre:

Research Report > Promising Solution (0.68)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Do Deep Nets Really Need to be Deep?

Jimmy Ba, Rich Caruana

Neural Information Processing SystemsFeb-9-2025, 23:39:29 GMT

Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow nets can learn these deep functions using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional models.

accuracy, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Simple MAP Inference via Low-Rank Relaxations

Roy Frostig, Sida Wang, Percy S. Liang, Christopher D. Manning

Neural Information Processing SystemsFeb-9-2025, 22:17:38 GMT

We focus on the problem of maximum a posteriori (MAP) inference in Markov random fields with binary variables and pairwise interactions. For this common subclass of inference tasks, we consider low-rank relaxations that interpolate between the discrete problem and its full-rank semidefinite relaxation. We develop new theoretical bounds studying the effect of rank, showing that as the rank grows, the relaxed objective increases but saturates, and that the fraction in objective value retained by the rounded discrete solution decreases. In practice, we show two algorithms for optimizing the low-rank objectives which are simple to implement, enjoy ties to the underlying theory, and outperform existing approaches on benchmark MAP inference tasks.

artificial intelligence, machine learning, relaxation, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks

Yanping Huang, Rajesh PN Rao

Neural Information Processing SystemsFeb-9-2025, 22:16:51 GMT

We propose a spiking network model capable of performing both approximate inference and learning for any hidden Markov model. The lower layer sensory neurons detect noisy measurements of hidden world states. The higher layer neurons with recurrent connections infer a posterior distribution over world states from spike trains generated by sensory neurons. We show how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in the population of inference neurons represents a sample of a particular hidden world state.

inference neuron, neuron, posterior distribution, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)

Add feedback

Automatic Discovery of Cognitive Skills to Improve the Prediction of Student Learning

Robert V. Lindsey, Mohammad Khajah, Michael C. Mozer

Neural Information Processing SystemsFeb-9-2025, 21:30:24 GMT

To master a discipline such as algebra or physics, students must acquire a set of cognitive skills. Traditionally, educators and domain experts use intuition to determine what these skills are and then select practice exercises to hone a particular skill. We propose a technique that uses student performance data to automatically discover the skills needed in a discipline. The technique assigns a latent skill to each exercise such that a student's expected accuracy on a sequence of same-skill exercises improves monotonically with practice. Rather than discarding the skills identified by experts, our technique incorporates a nonparametric prior over the exerciseskill assignments that is based on the expert-provided skills and a weighted Chinese restaurant process.

artificial intelligence, machine learning, student, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Instructional Material (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.69)
Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

A Complete Variational Tracker

Ryan D. Turner, Steven Bottone, Bhargav Avasarala

Neural Information Processing SystemsFeb-9-2025, 20:03:41 GMT

We introduce a novel probabilistic tracking algorithm that incorporates combinatorial data association constraints and model-based track management using variational Bayes. We use a Bethe entropy approximation to incorporate data association constraints that are often ignored in previous probabilistic tracking algorithms. Noteworthy aspects of our method include a model-based mechanism to replace heuristic logic typically used to initiate and destroy tracks, and an assignment posterior with linear computation cost in window length as opposed to the exponential scaling of previous MAP-based approaches. We demonstrate the applicability of our method on radar tracking and computer vision problems. The field of tracking is broad and possesses many applications, particularly in radar/sonar [1], robotics [14], and computer vision [3].

artificial intelligence, machine learning, tracker, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Optimistic Planning in Markov Decision Processes Using a Generative Model

Balázs Szörényi, Gunnar Kedenburg, Remi Munos

Neural Information Processing SystemsFeb-9-2025, 18:12:31 GMT

We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample complexity problem of computing, with probability 1 δ, an ɛ-optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for Stochastic-Optimistic Planning), based on the "optimism in the face of uncertainty" principle. StOP can be used in the general setting, requires only a generative model, and enjoys a complexity bound that only depends on the local structure of the MDP.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > France (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback