Agents
1663fba7b56da1e96bed6e30546a07b0-Supplemental-Conference.pdf
Thus,theassumption of the policy being conditionally-independent ofzฯ givenziฮฑ corresponds well to the assumption of agents only using local information (rather than joint information) in MARL to inform their policy/decision-making. Note that we found that cyclically-annealing [82]theฮฒ term in our variational lower bound from0to the values specified in Table 5to help avoid KL-vanishing. A.2.4 ComputationalDetails For MARL trajectory data generation, we used an internal CPU cluster for both the 3-agent hillclimbing and 2-agent coordination domains, using TPUs for only the multiagent MuJoCo data generation. Given a characteristic of interest (e.g., the level of dispersion of agents), we define a training set consisting of joint latentszฯ and class labelsy (e.g., classes corresponding to different intervals of team returns). Using these definitions, we can gauge the representational power ofzฯ by learning a mapping g: หฮฝc(zฯ) y. In practice, g is a simple model (e.g., shallow network or linear projection) so as to gauge the expressivity of the latent space.
Ego TaskQA: UnderstandingHumanTasksin EgocentricVideos
These questions are dividedintofourtypes,includingdescriptive(whatstatus?),predictive(whatwill?), explanatory (what caused?), and counterfactual (what if?) to provide diagnostic analyses onspatial, temporal, and causalunderstandings ofgoal-oriented tasks. We show an illustrative scenario where two subjects collaborate to makeanddrinkcereal.