MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization

Dec-11-2025–arXiv.org Artificial Intelligence

Cooperative Multi-Agent Reinforcement Learning (MARL) faces two major design bottlenecks: crafting dense reward functions and constructing curricula that avoid local optima in high-dimensional, non-stationary environments. Existing approaches rely on fixed heuristics or use Large Language Models (LLMs) directly in the control loop, which is costly and unsuitable for real-time systems. We propose MAESTRO (Multi-Agent Environment Shaping through Task and Reward Optimization), a framework that moves the LLM outside the execution loop and uses it as an offline training architect. MAESTRO introduces two generative components: (i) a semantic curriculum generator that creates diverse, performance-driven traffic scenarios, and (ii) an automated reward synthesizer that produces executable Python reward functions adapted to evolving curriculum difficulty. These components guide a standard MARL backbone (MADDPG) without increasing inference cost at deployment. We evaluate MAESTRO on large-scale traffic signal control (Hangzhou, 16 intersections) and conduct controlled ablations. Results show that combining LLM-generated curricula with LLM-generated reward shaping yields improved performance and stability. Across four seeds, the full system achieves +4.0% higher mean return (163.26 vs. 156.93) and 2.2% better risk-adjusted performance (Sharpe 1.53 vs. 0.70) over a strong curriculum baseline. These findings highlight LLMs as effective high-level designers for cooperative MARL training.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Dec-11-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.28)
- Asia > China
  - Zhejiang Province > Hangzhou (0.24)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (0.93)
- Transportation
  - Ground > Road (0.49)
  - Infrastructure & Services (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found