observation
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- (2 more...)
Generalizable Imitation Learning from Observation via Inferring Goal Proximity
Task progress is intuitive and readily available task information that can guide an agent closer to the desired goal. Furthermore, a task progress estimator can generalize to new situations. From this intuition, we propose a simple yet effective imitation learning from observation method for a goal-directed task using a learned goal proximity function as a task progress estimator for better generalization to unseen states and goals. We obtain this goal proximity function from expert demonstrations and online agent experience, and then use the learned goal proximity as a dense reward for policy training. We demonstrate that our proposed method can robustly generalize compared to prior imitation learning methods on a set of goal-directed tasks in navigation, locomotion, and robotic manipulation, even with demonstrations that cover only a part of the states.
LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of offline LfO tasks, we show that LobsDICE outperforms strong baseline methods.
Learning from Observation: A Survey of Recent Advances
Burnwal, Returaj, Mehta, Hriday, Bhatt, Nirav Pravinbhai, Ravindran, Balaraman
Imitation Learning (IL) algorithms offer an efficient way to train an agent by mimicking an expert's behavior without requiring a reward function. IL algorithms often necessitate access to state and action information from expert demonstrations. Although expert actions can provide detailed guidance, requiring such action information may prove impractical for real-world applications where expert actions are difficult to obtain. To address this limitation, the concept of learning from observation (LfO) or state-only imitation learning (SOIL) has recently gained attention, wherein the imitator only has access to expert state visitation information. In this paper, we present a framework for LfO and use it to survey and classify existing LfO methods in terms of their trajectory construction, assumptions and algorithm's design choices. This survey also draws connections between several related fields like offline RL, model-based RL and hierarchical RL. Finally, we use our framework to identify open problems and suggest future research directions.
- Asia > India > Tamil Nadu > Chennai (0.04)
- Asia > India > Karnataka (0.04)
- Oceania > New Zealand (0.04)
- (6 more...)
- Overview (1.00)
- Instructional Material > Course Syllabus & Notes (0.46)
- Education (0.67)
- Government (0.46)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Diffusion Imitation from Observation
Learning from Observation (LfO) aims to imitate experts by learning from state-only demonstrations without requiring action labels. Existing adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator that learns to classify agent and expert state transitions. Despite its simplicity in formulation, these methods are often sensitive to hyperparameters and brittle to train. Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework. Specifically, we employ a diffusion model to capture expert and agent transitions by generating the next state, given the current state.
Learning Diffusion Priors from Observations by Expectation Maximization
Diffusion models recently proved to be remarkable priors for Bayesian inverse problems. However, training these models typically requires access to large amounts of clean data, which could prove difficult in some settings. In this work, we present a novel method based on the expectation-maximization algorithm for training diffusion models from incomplete and noisy observations only. Unlike previous works, our method leads to proper diffusion models, which is crucial for downstream tasks. As part of our method, we propose and motivate an improved posterior sampling scheme for unconditional diffusion models.
Reviews: Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
I was happy with your inclusion of experiments on manipulation tasks, and agree they're convincing. I was also happy with your explanation on GAILfo vs GAIL vs your algorithm, and your discussion on Sun et al 2019. Your decision to release code also helps with any fears I have about reproducibility. I have changed my score to an 8 to reflect these improvements. Easy to follow the logic and thoughts of the authors.
Reviews: Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
Learning from Observation (LoF) is harder, but more practical, than Learning from Demonstration (LfD) that involves both action and state supervisions. The paper studies the difference between the two types of learning in both theoretical and practical perspectives, and relates the gap between LfD and LfO to inverse dynamics disagreement between the imitator and the expert. The paper includes an elaborate and interesting theoretical analysis of this gap, and proposes a method for bridging the gap through entropy maximization. The empirical evaluation is also thorough and includes both a toy problem for studying the effect of inverse dynamics discrepancy, MuJoCO problems and an ablation study. The reviewers are in agreement that this is a good, technically sound paper.
Modern Machine Learning Algorithms: Strengths and Weaknesses
In this guide, we'll take a practical, concise tour through modern machine learning algorithms. While other such lists exist, they don't really explain the practical tradeoffs of each algorithm, which we hope to do here. We'll discuss the advantages and disadvantages of each algorithm based on our experience. Categorizing machine learning algorithms is tricky, and there are several reasonable approaches; they can be grouped into generative/discriminative, parametric/non-parametric, supervised/unsupervised, and so on. However, from our experience, this isn't always the most practical way to group algorithms.
Reports
The IJCAI-09 Workshop on Learning Structural Knowledge from Observations (STRUCK-09) took place as part of the International Joint Conference on Artificial Intelligence (IJCAI-09) on July 12 in Pasadena, California. The workshop program included paper presentations, discussion sessions about those papers, group discussions about two selected topics, and a joint discussion. As a result, many cognitive architectures use structural models to represent relations between knowledge of different complexity. Structural modeling has led to a number of representation and reasoning formalisms including frames, schemas, abstractions, hierarchical task networks (HTNs), and goal graphs among others. These formalisms have in common the use of certain kinds of constructs (for example, objects, goals, skills, and tasks) that represent knowledge of varying degrees of complexity and that are connected through structural relations.
- Information Technology > Software (0.72)
- Leisure & Entertainment > Games > Computer Games (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.49)
- Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.35)