1663fba7b56da1e96bed6e30546a07b0-Supplemental-Conference.pdf

Feb-7-2026, 15:35:17 GMT–Neural Information Processing Systems

Thus,theassumption of the policy being conditionally-independent ofzω givenziα corresponds well to the assumption of agents only using local information (rather than joint information) in MARL to inform their policy/decision-making. Note that we found that cyclically-annealing [82]theβ term in our variational lower bound from0to the values specified in Table 5to help avoid KL-vanishing. A.2.4 ComputationalDetails For MARL trajectory data generation, we used an internal CPU cluster for both the 3-agent hillclimbing and 2-agent coordination domains, using TPUs for only the multiagent MuJoCo data generation. Given a characteristic of interest (e.g., the level of dispersion of agents), we define a training set consisting of joint latentszω and class labelsy (e.g., classes corresponding to different intervals of team returns). Using these definitions, we can gauge the representational power ofzω by learning a mapping g: ˆνc(zω) y. In practice, g is a simple model (e.g., shallow network or linear projection) so as to gauge the expressivity of the latent space.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Feb-7-2026, 15:35:17 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Agents (0.68)

Duplicate Docs Excel Report

Title
1663fba7b56da1e96bed6e30546a07b0-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found