AITopics

We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains. The source code for this work can be found at https://github.com/Dakout/Convex_DRL.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2502.09298

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Dolan, Sydney, Nayak, Siddharth, Aloor, Jasmine Jerry, Balakrishnan, Hamsa

Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

Communication is crucial in cooperative multi-agent systems with partial observability, as it enables a better understanding of the environment and improves coordination. In extreme environments such as those underwater or in space, the frequency of communication between agents is often limited [1, 2]. For example, a satellite may not be able to reliably receive and react to messages from other satellites synchronously due to limited onboard power and communication delays. In these scenarios, agents aim to establish a communication protocol that allows them to operate independently while still receiving sufficient information to effectively coordinate with nearby agents. Multi-agent reinforcement learning (MARL) has emerged as a popular approach for addressing cooperative navigation challenges involving multiple agents.

agent, artificial intelligence, machine learning, (12 more...)

2502.00558

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.83)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Robust Event-Triggered Integrated Communication and Control with Graph Information Bottleneck Optimization

Wang, Ziqiong, Yu, Xiaoxue, Li, Rongpeng, Zhao, Zhifeng

Integrated communication and control serves as a critical ingredient in Multi-Agent Reinforcement Learning. However, partial observability limitations will impair collaboration effectiveness, and a potential solution is to establish consensus through well-calibrated latent variables obtained from neighboring agents. Nevertheless, the rigid transmission of less informative content can still result in redundant information exchanges. Therefore, we propose a Consensus-Driven Event-Based Graph Information Bottleneck (CDE-GIB) method, which integrates the communication graph and information flow through a GIB regularizer to extract more concise message representations while avoiding the high computational complexity of inner-loop operations. To further minimize the communication volume required for establishing consensus during interactions, we also develop a variable-threshold event-triggering mechanism. By simultaneously considering historical data and current observations, this mechanism capably evaluates the importance of information to determine whether an event should be triggered. Experimental results demonstrate that our proposed method outperforms existing state-of-the-art methods in terms of both efficiency and adaptability.

communication, machine learning, reinforcement learning, (16 more...)

2502.09846

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kumar, Navdeep, Gupta, Adarsh, Elfatihi, Maxence Mohamed, Ramponi, Giorgia, Levy, Kfir Yehuda, Mannor, Shie

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular $L_p$-bounded sets and leverage its structural properties to derive a novel dual formulation for $L_p$ RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.

artificial intelligence, machine learning, optimization problem, (16 more...)

2502.09432

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning

Wang, Xun, Li, Zhuoran, Zhong, Hai, Huang, Longbo

As a data-driven approach, offline MARL learns superior policies solely from offline datasets, ideal for domains rich in historical data but with high interaction costs and risks. However, most existing methods are task-specific, requiring retraining for new tasks, leading to redundancy and inefficiency. To address this issue, in this paper, we propose a task-efficient multi-task offline MARL algorithm, Skill-Discovery Conservative Q-Learning (SD-CQL). Unlike existing offline skill-discovery methods, SD-CQL discovers skills by reconstructing the next observation. It then evaluates fixed and variable actions separately and employs behavior-regularized conservative Q-learning to execute the optimal action for each skill. This approach eliminates the need for local-global alignment and enables strong multi-task generalization from limited small-scale source tasks. Substantial experiments on StarCraftII demonstrates the superior generalization performance and task-efficiency of SD-CQL. It achieves the best performance on $\textbf{10}$ out of $14$ task sets, with up to $\textbf{65%}$ improvement on individual task sets, and is within $4\%$ of the best baseline on the remaining four.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2502.08985

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsFeb-12-2025, 02:33:34 GMT

Review for NeurIPS paper: Learning Restricted Boltzmann Machines with Sparse Latent Variables

Additional Feedback: I found the phrase "few latent variables" to be confusing, since this is not really what the authors mean. In particular, their bounds use "s", which is bounded by the number of latent variables connected to any variable in the Markov blanket of Xi in the marginal distribution over the visible X. This was not clear, in my view, until the formal definition (150-155). Since this style of RBM is not well studied, the practical significance of learning rates is not clear. The type of model is intuitively appealing (many models use "local" latent variables or encourage sparseness), and perhaps the work would spur application of these styles of RBM, but it's difficult to say that this is improving the theory for a well-established problem of importance.

latent variable, learning restricted boltzmann machine, sparse latent variable, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Neural Information Processing SystemsFeb-12-2025, 02:24:40 GMT

On Herding and the Perceptron Cycling Theorem

Andrew Gelfand, Yutian Chen, Laurens Maaten, Max Welling

The paper develops a connection between traditional perceptron algorithms and recently introduced herding algorithms. It is shown that both algorithms can be viewed as an application of the perceptron cycling theorem. This connection strengthens some herding results and suggests new (supervised) herding algorithms that, like CRFs or discriminative RBMs, make predictions by conditioning on the input attributes. We develop and investigate variants of conditional herding, and show that conditional herding leads to practical algorithms that perform better than or on par with related classifiers such as the voted perceptron and the discriminative RBM.

algorithm, artificial intelligence, machine learning, (15 more...)

Country: North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Neural Information Processing SystemsFeb-12-2025, 01:00:06 GMT

Conditional Random Field Autoencoders for Unsupervised Structured Prediction

Waleed Ammar, Chris Dyer, Noah A. Smith

We introduce a framework for unsupervised learning of structured predictors with overlapping, global features. Each input's latent representation is predicted conditional on the observed data using a feature-rich conditional random field (CRF). Then a reconstruction of the input is (re)generated, conditional on the latent structure, using a generative model which factorizes similarly to the CRF. The autoencoder formulation enables efficient exact inference without resorting to unrealistic independence assumptions or restricting the kinds of features that can be used. We illustrate connections to traditional autoencoders, posterior regularization, and multi-view learning. We then show competitive results with instantiations of the framework for two canonical tasks in natural language processing: part-of-speech induction and bitext word alignment, and show that training the proposed model can be substantially more efficient than a comparable feature-rich baseline.

artificial intelligence, autoencoder, machine learning, (13 more...)

Country: North America > United States (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Neural Information Processing SystemsFeb-12-2025, 00:51:15 GMT

Asynchronous Anytime Sequential Monte Carlo

Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh

We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional sequential Monte Carlo algorithms that is amenable to parallel and distributed implementations. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade provides an unbiased marginal likelihood estimator which can be straightforwardly plugged into existing pseudo-marginal methods.

artificial intelligence, machine learning, particle, (15 more...)

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Zamboni, Riccardo, Mutti, Mirco, Restelli, Marcello

Towards Principled Multi-Agent Task Agnostic Exploration

arXiv.org Artificial IntelligenceFeb-12-2025

In reinforcement learning, we typically refer to task-agnostic exploration when we aim to explore the environment without access to the task specification a priori. In a single-agent setting the problem has been extensively studied and mostly understood. A popular approach cast the task-agnostic objective as maximizing the entropy of the state distribution induced by the agent's policy, from which principles and methods follows. In contrast, little is known about task-agnostic exploration in multi-agent settings, which are ubiquitous in the real world. How should different agents explore in the presence of others? In this paper, we address this question through a generalization to multiple agents of the problem of maximizing the state distribution entropy. First, we investigate alternative formulations, highlighting respective positives and negatives. Then, we present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings. Finally, we provide proof of concept experiments to both corroborate the theoretical findings and pave the way for task-agnostic exploration in challenging multi-agent settings.

artificial intelligence, machine learning, objective, (15 more...)

2502.08365

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)