state transition
- Asia > Taiwan (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.43)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China (0.04)
Variational Temporal Abstraction
Taesup Kim, Sungjin Ahn, Yoshua Bengio
There have been approaches to learn such hierarchical structure in sequences such as the HMRNN (Chung et al., 2016). However, as a deterministic model, it has the main limitation that it cannot capture the stochastic nature prevailing in the data. In particular,this is acritical limitation to imagination-augmented agents because exploring various possible futures according to the uncertainty is what makes the imagination meaningful in many cases.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Agent 1 Agent 2 River Tiles (a) The initial setup with two agents and two river
Agent 1's action is resolved first. Figure 8: An example of Agent 1 using the "clean" action while facing East. The "main" beam extends directly in front of the agent, while two auxiliary A beam stops when it hits a dirty river tile. The Sequential Social Dilemma Games, introduced in Leibo et al. [2017], are a kind of MARL All of these have open source implementations in [Vinitsky et al., 2019]. The cleaning beam is shown in Figure 8a.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States > Illinois (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England (0.04)
Diffusion Imitation from Observation
Learning from Observation (LfO) aims to imitate experts by learning from state-only demonstrations without requiring action labels. Existing adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator that learns to classify agent and expert state transitions. Despite its simplicity in formulation, these methods are often sensitive to hyperparameters and brittle to train. Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework. Specifically, we employ a diffusion model to capture expert and agent transitions by generating the next state, given the current state. Then, we reformulate the learning objective to train the diffusion model as a binary classifier and use it to provide ``realness'' rewards for policy learning. Our proposed framework, Diffusion Imitation from Observation (DIFO), demonstrates superior performance in various continuous control domains, including navigation, locomotion, manipulation, and games.