to

### An Energy-based Perspective on Learning Observation Models

Figure 1 We show that learning observation models can be viewed as shaping energy functions that graph optimizers, even non-differentiable ones, optimize. Inference solves for most likely states $$x$$ given model and input measurements $$z.$$ Learning uses training data to update observation model parameters $$\theta$$. Robots perceive the rich world around them through the lens of their sensors. Each sensor observation is a tiny window into the world that provides only a partial, simplified view of reality.

### Manipulation by Feel: Touch-Based Control with Deep Predictive Models

Touch sensing is widely acknowledged to be important for dexterous robotic manipulation, but exploiting tactile sensing for continuous, non-prehensile manipulation is challenging. General purpose control techniques that are able to effectively leverage tactile sensing as well as accurate physics models of contacts and forces remain largely elusive, and it is unclear how to even specify a desired behavior in terms of tactile percepts. In this paper, we take a step towards addressing these issues by combining high-resolution tactile sensing with data-driven modeling using deep neural network dynamics models. We propose deep tactile MPC, a framework for learning to perform tactile servoing from raw tactile sensor inputs, without manual supervision. We show that this method enables a robot equipped with a GelSight-style tactile sensor to manipulate a ball, analog stick, and 20-sided die, learning from unsupervised autonomous interaction and then using the learned tactile predictive model to reposition each object to user-specified configurations, indicated by a goal tactile reading. Videos, visualizations and the code are available here: https://sites.google.com/view/deeptactilempc

### The Differentiable Cross-Entropy Method

T HE D IFFERENTIABLEC ROSS-E NTROPYM ETHOD Brandon Amos 1 Denis Y arats 12 1 Facebook AI Research 2 New Y ork University A BSTRACT We study the Cross-Entropy Method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant (DCEM) that enables us to differentiate the output of CEM with respect to the objective function's parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline where this has otherwise been impossible. We show applications in a synthetic energy-based structured prediction task and in non-convex continuous control. In this paper we focus on the setting of optimizing an unconstrained, non-convex, and continuous objective function f θ(x): R n Θ R as ˆ x arg min x f θ(x), where f is parameterized by θ Θ and has inputs x R n . If it exists, some (sub-)derivative θˆ x is useful in the machine learning setting to make the output of the optimization procedure end-to-end learnable. For example, θ could parameterize a predictive model that is generating potential outcomes conditional on x happening that you want to optimize over. End-to-end learning in these settings can be done by defining a loss function L on top of ˆ x and taking gradient steps θL . If f θ were convex this gradient is easy to analyze and compute when it exists and is unique (Gould et al., 2016; Johnson et al., 2016; Amos et al., 2017; Amos & Kolter, 2017). Unfortunately analyzing and computing a "derivative" through the non-convex arg min here is not as easy and is challenging in theory and practice. No such derivative may exist in theory, it might not be unique, and even if it uniquely exists, the numerical solver being used to compute the solution may not find a global or even local optimum of f . One promising direction to sidestep these issues is to approximate the arg min operation with an explicit optimization procedure that is interpreted as just another compute graph and unrolled through.

### Regularizing Model-Based Planning with Energy-Based Models

Model-based reinforcement learning could enable sample-efficient learning by quickly acquiring rich knowledge about the world and using it to improve behaviour without additional data. Learned dynamics models can be directly used for planning actions but this has been challenging because of inaccuracies in the learned models. In this paper, we focus on planning with learned dynamics models and propose to regularize it using energy estimates of state transitions in the environment. We visually demonstrate the effectiveness of the proposed method and show that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models. Further, we demonstrate that the proposed method enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant in just a few minutes of experience.

### Optical Tactile Sim-to-Real Policy Transfer via Real-to-Sim Tactile Image Translation

Simulation has recently become key for deep reinforcement learning to safely and efficiently acquire general and complex control policies from visual and proprioceptive inputs. Tactile information is not usually considered despite its direct relation to environment interaction. In this work, we present a suite of simulated environments tailored towards tactile robotics and reinforcement learning. A simple and fast method of simulating optical tactile sensors is provided, where high-resolution contact geometry is represented as depth images. Proximal Policy Optimisation (PPO) is used to learn successful policies across all considered tasks. A data-driven approach enables translation of the current state of a real tactile sensor to corresponding simulated depth images. This policy is implemented within a real-time control loop on a physical robot to demonstrate zero-shot sim-to-real policy transfer on several physically-interactive tasks requiring a sense of touch.