Plotting

 Garg, Animesh


Uniform Priors for Data-Efficient Transfer

arXiv.org Machine Learning

Deep Neural Networks have shown great promise on a variety of downstream applications; but their ability to adapt and generalize to new data and tasks remains a challenge. However, the ability to perform few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore crucial to understand what makes for good, transfer-able features in deep networks that best allow for such adaptation. In this paper, we shed light on this by showing that features that are most transferable have high uniformity in the embedding space and propose a uniformity regularization scheme that encourages better transfer and feature reuse. We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data, for which we conduct a thorough experimental study covering four relevant, and distinct domains: few-shot Meta-Learning, Deep Metric Learning, Zero-Shot Domain Adaptation, as well as Out-of-Distribution classification. Across all experiments, we show that uniformity regularization consistently offers benefits over baseline methods and is able to achieve state-of-the-art performance in Deep Metric Learning and Meta-Learning.


OCEAN: Online Task Inference for Compositional Tasks with Context Adaptation

arXiv.org Artificial Intelligence

Real-world tasks often exhibit a compositional structure that contains a sequence of simpler sub-tasks. For instance, opening a door requires reaching, grasping, rotating, and pulling the door knob. Such compositional tasks require an agent to reason about the sub-task at hand while orchestrating global behavior accordingly. This can be cast as an online task inference problem, where the current task identity, represented by a context variable, is estimated from the agent's past experiences with probabilistic inference. Previous approaches have employed simple latent distributions, e.g., Gaussian, to model a single context for the entire task. However, this formulation lacks the expressiveness to capture the composition and transition of the sub-tasks. We propose a variational inference framework OCEAN to perform online task inference for compositional tasks. OCEAN models global and local context variables in a joint latent space, where the global variables represent a mixture of sub-tasks required for the task, while the local variables capture the transitions between the sub-tasks. Our framework supports flexible latent distributions based on prior knowledge of the task structure and can be trained in an unsupervised manner. Experimental results show that OCEAN provides more effective task inference with sequential context adaptation and thus leads to a performance boost on complex, multi-stage tasks.


Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter

arXiv.org Artificial Intelligence

When searching for objects in cluttered environments, it is often necessary to perform complex interactions in order to move occluding objects out of the way and fully reveal the object of interest and make it graspable. Due to the complexity of the physics involved and the lack of accurate models of the clutter, planning and controlling precise predefined interactions with accurate outcome is extremely hard, when not impossible. In problems where accurate (forward) models are lacking, Deep Reinforcement Learning (RL) has shown to be a viable solution to map observations (e.g. images) to good interactions in the form of close-loop visuomotor policies. However, Deep RL is sample inefficient and fails when applied directly to the problem of unoccluding objects based on images. In this work we present a novel Deep RL procedure that combines i) teacher-aided exploration, ii) a critic with privileged information, and iii) mid-level representations, resulting in sample efficient and effective learning for the problem of uncovering a target object occluded by a heap of unknown objects. Our experiments show that our approach trains faster and converges to more efficient uncovering solutions than baselines and ablations, and that our uncovering policies lead to an average improvement in the graspability of the target object, facilitating downstream retrieval applications.


Counterfactual Data Augmentation using Locally Factored Dynamics

arXiv.org Artificial Intelligence

Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the dynamics at any given time step can often be decomposed into locally independent causal mechanisms. Such local causal structures can be leveraged to improve the sample efficiency of sequence prediction and off-policy reinforcement learning. We formalize this by introducing local causal models (LCMs), which are induced from a global causal model by conditioning on a subset of the state space. We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for model-free Counterfactual Data Augmentation (CoDA). CoDA uses local structures and an experience replay to generate counterfactual experiences that are causally valid in the global model. We find that CoDA significantly improves the performance of RL agents in locally factored tasks, including the batch-constrained and goal-conditioned settings.


Causal Discovery in Physical Systems from Videos

arXiv.org Machine Learning

Causal discovery is at the core of human cognition. It enables us to reason about the environment and make counterfactual predictions about unseen scenarios, that can vastly differ from our previous experiences. We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure. In particular, our goal is to discover the structural dependencies among environmental and object variables: inferring the type and strength of interactions that have a causal effect on the behavior of the dynamical system. Our model consists of (a) a perception module that extracts a semantically meaningful and temporally consistent keypoint representation from images, (b) an inference module for determining the graph distribution induced by the detected keypoints, and (c) a dynamics module that can predict the future by conditioning on the inferred graph. We assume access to different configurations and environmental conditions, i.e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions. We evaluate our method in a planar multi-body interaction environment and scenarios involving fabrics of different shapes like shirts and pants. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions. The causal structure assumed by the model also allows it to make counterfactual predictions and extrapolate to systems of unseen interaction graphs or graphs of various sizes.


Experience Replay with Likelihood-free Importance Weights

arXiv.org Artificial Intelligence

The use of past experiences to accelerate temporal difference (TD) learning of value functions, or experience replay, is a key component in deep reinforcement learning. Prioritization or reweighting of important experiences has shown to improve performance of TD learning algorithms. In this work, we propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. Using the corresponding reweighted TD objective, we implicitly encourage small approximation errors on the value function over frequently encountered states. We use a likelihood-free density ratio estimator over the replay buffer to assign the prioritization weights. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3) - over a suite of OpenAI gym tasks and achieve superior sample complexity compared to other baseline approaches.


LEAF: Latent Exploration Along the Frontier

arXiv.org Artificial Intelligence

Self-supervised goal proposal and reaching is a key component for exploration and efficient policy learning algorithms. Such a self-supervised approach without access to any oracle goal sampling distribution requires deep exploration and commitment so that long horizon plans can be efficiently discovered. In this paper, we propose an exploration framework, which learns a dynamics-aware manifold of reachable states. For a goal, our proposed method deterministically visits a state at the current frontier of reachable states (commitment/reaching) and then stochastically explores to reach the goal (exploration). This allocates exploration budget near the frontier of the reachable region instead of its interior. We target the challenging problem of policy learning from initial and goal states specified as images, and do not assume any access to the underlying ground-truth states of the robot and the environment. To keep track of reachable latent states, we propose a distance-conditioned reachability network that is trained to infer whether one state is reachable from another within the specified latent space distance. Given an initial state, we obtain a frontier of reachable states from that state. By incorporating a curriculum for sampling easier goals (closer to the start state) before more difficult goals, we demonstrate that the proposed self-supervised exploration algorithm, can achieve $20\%$ superior performance on average compared to existing baselines on a set of challenging robotic environments, including on a real robot manipulation task.


Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

arXiv.org Artificial Intelligence

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at https://sites.google.com/view/guapo-rl


AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers

arXiv.org Artificial Intelligence

The exploration mechanism used by a Deep Reinforcement Learning (RL) agent plays a key role in determining its sample efficiency. Thus, improving over random exploration is crucial to solve long-horizon tasks with sparse rewards. We propose to leverage an ensemble of partial solutions as teachers that guide the agent's exploration with action suggestions throughout training. While the setup of learning with teachers has been previously studied, our proposed approach - Actor-Critic with Teacher Ensembles (AC-Teach) - is the first to work with an ensemble of suboptimal teachers that may solve only part of the problem or contradict other each other, forming a unified algorithmic solution that is compatible with a broad range of teacher ensembles. AC-Teach leverages a probabilistic representation of the expected outcome of the teachers' and student's actions to direct exploration, reduce dithering, and adapt to the dynamically changing quality of the learner. We evaluate a variant of AC-Teach that guides the learning of a Bayesian DDPG agent on three tasks - path following, robotic pick and place, and robotic cube sweeping using a hook - and show that it improves largely on sampling efficiency over a set of baselines, both for our target scenario of unconstrained suboptimal teachers and for easier setups with optimal or single teachers. Additional results and videos at https://sites.google.com/view/acteach/home.


Video Interpolation and Prediction with Unsupervised Landmarks

arXiv.org Machine Learning

Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical flow based techniques generalize but are suitable only for short temporal ranges. Many methods opt to project the video frames to a low dimensional latent space, achieving long-range predictions. However, these latent representations are often non-interpretable, and therefore difficult to manipulate. This work poses video prediction and interpolation as unsupervised latent structure inference followed by a temporal prediction in this latent space. The latent representations capture foreground semantics without explicit supervision such as keypoints or poses. Further, as each landmark can be mapped to a coordinate indicating where a semantic part is positioned, we can reliably interpolate within the coordinate domain to achieve predictable motion interpolation. Given an image decoder capable of mapping these landmarks back to the image domain, we are able to achieve high-quality long-range video interpolation and extrapolation by operating on the landmark representation space.