Distributed Multi-Agent Coordination Using Multi-Modal Foundation Models

Mahmud, Saaduddin, Goldfajn, Dorian Benhamou, Zilberstein, Shlomo

Jan-23-2025–arXiv.org Artificial Intelligence

Distributed Constraint Optimization Problems (DCOPs) offer a powerful framework for multi-agent coordination but often rely on labor-intensive, manual problem construction. To address this, we introduce VL-DCOPs, a framework that takes advantage of large multimodal foundation models (LFMs) to automatically generate constraints from both visual and linguistic instructions. We then introduce a spectrum of agent archetypes for solving VL-DCOPs: from a neuro-symbolic agent that delegates some of the algorithmic decisions to an LFM, to a fully neural agent that depends entirely on an LFM for coordination. We evaluate these agent archetypes using state-of-the-art LLMs (large language models) and VLMs (vision language models) on three novel VL-DCOP tasks and compare their respective advantages and drawbacks. Lastly, we discuss how this work extends to broader frontier challenges in the DCOP literature.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jan-23-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.14)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning
    - Agents (1.00)
    - Constraint-Based Reasoning (1.00)