Goto

Collaborating Authors

 vapor



Probabilistic Inference in Reinforcement Learning Done Right Jean T arbouriech Google DeepMind

Neural Information Processing Systems

A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy.


Probabilistic Inference in Reinforcement Learning Done Right

arXiv.org Artificial Intelligence

A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.


Learning Sequential Acquisition Policies for Robot-Assisted Feeding

arXiv.org Artificial Intelligence

A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up and feed a range of food items. Beyond these dexterous low-level skills, an assistive robot must also plan these strategies in sequence over a long horizon to clear a plate and complete a meal. Previous methods in robot-assisted feeding introduce highly specialized primitives for food handling without a means to compose them together. Meanwhile, existing approaches to long-horizon manipulation lack the flexibility to embed highly specialized primitives into their frameworks. We propose Visual Action Planning OveR Sequences (VAPORS), a framework for long-horizon food acquisition. VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation. To carry out sequential plans in the real world, VAPORS delegates action execution to visually parameterized primitives. We validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans. Across 38 plates, VAPORS acquires much more efficiently than baselines, generalizes across realistic plate variations such as toppings and sauces, and qualitatively appeals to user feeding preferences in a survey conducted across 49 individuals. Code, datasets, videos, and supplementary materials can be found on our website: https://sites.google.com/view/vaporsbot.


VAPOR: Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning

arXiv.org Artificial Intelligence

We present VAPOR, a novel method for autonomous legged robot navigation in unstructured, densely vegetated outdoor environments using offline Reinforcement Learning (RL). Our method trains a novel RL policy using an actor-critic network and arbitrary data collected in real outdoor vegetation. Our policy uses height and intensity-based cost maps derived from 3D LiDAR point clouds, a goal cost map, and processed proprioception data as state inputs, and learns the physical and geometric properties of the surrounding obstacles such as height, density, and solidity/stiffness. The fully-trained policy's critic network is then used to evaluate the quality of dynamically feasible velocities generated from a novel context-aware planner. Our planner adapts the robot's velocity space based on the presence of entrapment inducing vegetation, and narrow passages in dense environments. We demonstrate our method's capabilities on a Spot robot in complex real-world outdoor scenes, including dense vegetation. We observe that VAPOR's actions improve success rates by up to 40%, decrease the average current consumption by up to 2.9%, and decrease the normalized trajectory length by up to 11.2% compared to existing end-to-end offline RL and other outdoor navigation methods.


Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading

arXiv.org Artificial Intelligence

Automated Short Answer Grading (ASAG) has been an active area of machine-learning research for over a decade. It promises to let educators grade and give feedback on free-form responses in large-enrollment courses in spite of limited availability of human graders. Over the years, carefully trained models have achieved increasingly higher levels of performance. More recently, pre-trained Large Language Models (LLMs) emerged as a commodity, and an intriguing question is how a general-purpose tool without additional training compares to specialized models. We studied the performance of GPT-4 on the standard benchmark 2-way and 3-way datasets SciEntsBank and Beetle, where in addition to the standard task of grading the alignment of the student answer with a reference answer, we also investigated withholding the reference answer. We found that overall, the performance of the pre-trained general-purpose GPT-4 LLM is comparable to hand-engineered models, but worse than pre-trained LLMs that had specialized training.


Machine learning methods for Schlieren imaging of a plasma channel in tenuous atomic vapor

arXiv.org Artificial Intelligence

We investigate the usage of a Schlieren imaging setup to measure the geometrical dimensions of a plasma channel in atomic vapor. Near resonant probe light is used to image the plasma channel in a tenuous vapor and machine learning techniques are tested for extracting quantitative information from the images. By building a database of simulated signals with a range of plasma parameters for training Deep Neural Networks, we demonstrate that they can extract from the Schlieren images reliably and with high accuracy the location, the radius and the maximum ionization fraction of the plasma channel as well as the width of the transition region between the core of the plasma channel and the unionized vapor. We test several different neural network architectures with supervised learning and show that the parameter estimations supplied by the networks are resilient with respect to slight changes of the experimental parameters that may occur in the course of a measurement.


Kansas City doctor uses 'vaping robot' in research

#artificialintelligence

Dr. Matthias Salathe does the research in his lab at the University of Kansas Medical Center. Dr. Matthias Salathe does the research in his lab at the University of Kansas Medical Center. A Kansas City doctor is performing groundbreaking research on vaping, using a robot. Dr. Matthias Salathe spends a lot of time with e-cigarettes. "The notion was it's safe, and frankly we did not believe this," said Salathe.


Kansas City doctor uses 'vaping robot' in research

#artificialintelligence

Dr. Matthias Salathe does the research in his lab at the University of Kansas Medical Center. Dr. Matthias Salathe does the research in his lab at the University of Kansas Medical Center. A Kansas City doctor is performing groundbreaking research on vaping, using a robot. Dr. Matthias Salathe spends a lot of time with e-cigarettes. "The notion was it's safe, and frankly we did not believe this," said Salathe.