Not enough data to create a plot.
Try a different view from the menu above.
Mahmud, Saaduddin
Distributed Multi-Agent Coordination Using Multi-Modal Foundation Models
Mahmud, Saaduddin, Goldfajn, Dorian Benhamou, Zilberstein, Shlomo
Distributed Constraint Optimization Problems (DCOPs) offer a powerful framework for multi-agent coordination but often rely on labor-intensive, manual problem construction. To address this, we introduce VL-DCOPs, a framework that takes advantage of large multimodal foundation models (LFMs) to automatically generate constraints from both visual and linguistic instructions. We then introduce a spectrum of agent archetypes for solving VL-DCOPs: from a neuro-symbolic agent that delegates some of the algorithmic decisions to an LFM, to a fully neural agent that depends entirely on an LFM for coordination. We evaluate these agent archetypes using state-of-the-art LLMs (large language models) and VLMs (vision language models) on three novel VL-DCOP tasks and compare their respective advantages and drawbacks. Lastly, we discuss how this work extends to broader frontier challenges in the DCOP literature.
MAPLE: A Framework for Active Preference Learning Guided by Large Language Models
Mahmud, Saaduddin, Nakamura, Mason, Zilberstein, Shlomo
The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of interpretability. To address these issues, we introduce MAPLE, a framework for large language model-guided Bayesian active preference learning. MAPLE leverages LLMs to model the distribution over preference functions, conditioning it on both natural language feedback and conventional preference learning feedback, such as pairwise trajectory rankings. MAPLE also employs active learning to systematically reduce uncertainty in this distribution and incorporates a language-conditioned active query selection mechanism to identify informative and easy-to-answer queries, thus reducing human burden. We evaluate MAPLE's sample efficiency and preference inference quality across two benchmarks, including a real-world vehicle route planning benchmark using OpenStreetMap data. Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.
Causal Explanations for Sequential Decision Making Under Uncertainty
Nashed, Samer B., Mahmud, Saaduddin, Goldman, Claudia V., Zilberstein, Shlomo
We introduce a novel framework for causal explanations of stochastic, sequential decision-making systems built on the well-studied structural causal model paradigm for causal reasoning. This single framework can identify multiple, semantically distinct explanations for agent actions -- something not previously possible. In this paper, we establish exact methods and several approximation techniques for causal inference on Markov decision processes using this framework, followed by results on the applicability of the exact methods and some run time bounds. We discuss several scenarios that illustrate the framework's flexibility and the results of experiments with human subjects that confirm the benefits of this approach.
Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network
Mahmud, Rafid Ameer, Faisal, Fahim, Mahmud, Saaduddin, Khan, Md. Mosaddek
Multi-agent Markov Decision Process (MMDP) has been an effective way of modelling sequential decision making algorithms for multi-agent cooperative environments. A number of algorithms based on centralized and decentralized planning have been developed in this domain. However, dynamically changing environment, coupled with exponential size of the state and joint action space, make it difficult for these algorithms to provide both efficiency and scalability. Recently, Centralized planning algorithm FV-MCTS-MP and decentralized planning algorithm \textit{Alternate maximization with Behavioural Cloning} (ABC) have achieved notable performance in solving MMDPs. However, they are not capable of adapting to dynamically changing environments and accounting for the lack of communication among agents, respectively. Against this background, we introduce a simulation based online planning algorithm, that we call SiCLOP, for multi-agent cooperative environments. Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation and provides real time solution of a MMDP problem. It also improves scalability through an effective pruning of action space. Additionally, unlike FV-MCTS-MP and ABC, SiCLOP supports transfer learning, which enables learned agents to operate in different environments. We also provide theoretical discussion about the convergence property of our algorithm within the context of multi-agent settings. Finally, our extensive empirical results show that SiCLOP significantly outperforms the state-of-the-art online planning algorithms.
On Population-Based Algorithms for Distributed Constraint Optimization Problems
Mahmud, Saaduddin, Khan, Md. Mosaddek, Jennings, Nicholas R.
Distributed Constraint Optimization Problems (DCOPs) are a widely studied class of optimization problems in which interaction between a set of cooperative agents are modeled as a set of constraints. DCOPs are NP-hard and significant effort has been devoted to developing methods for finding incomplete solutions. In this paper, we study an emerging class of such incomplete algorithms that are broadly termed as population-based algorithms. The main characteristic of these algorithms is that they maintain a population of candidate solutions of a given problem and use this population to cover a large area of the search space and to avoid local-optima. In recent years, this class of algorithms has gained significant attention due to their ability to produce high-quality incomplete solutions. With the primary goal of further improving the quality of solutions compared to the state-of-the-art incomplete DCOP algorithms, we present two new population-based algorithms in this paper. Our first approach, Anytime Evolutionary DCOP or AED, exploits evolutionary optimization meta-heuristics to solve DCOPs. We also present a novel anytime update mechanism that gives AED its anytime property. While in our second contribution, we show that population-based approaches can be combined with local search approaches. Specifically, we develop an algorithm called DPSA based on the Simulated Annealing meta-heuristic. We empirically evaluate these two algorithms to illustrate their respective effectiveness in different settings against the state-of-the-art incomplete DCOP algorithms including all existing population-based algorithms in a wide variety of benchmarks. Our evaluation shows AED and DPSA markedly outperform the state-of-the-art and produce up to 75% improved solutions.