Subramanian, Kaushik
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Lee, Hojoon, Hwang, Dongyoon, Kim, Donghu, Kim, Hyunseung, Tai, Jun Jet, Subramanian, Kaushik, Wurman, Peter R., Choo, Jaegul, Stone, Peter, Seno, Takuma
Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.
A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo
Vasco, Miguel, Seno, Takuma, Kawamoto, Kenta, Subramanian, Kaushik, Wurman, Peter R., Stone, Peter
Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Turismo. However, this agent relied on global features that require instrumentation external to the car. This paper introduces, to the best of our knowledge, the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity. By leveraging global features only at training time, the learned agent is able to outperform the best human drivers in time trial (one car on the track at a time) races using only local input features. The resulting agent is evaluated in Gran Turismo 7 on multiple tracks and cars. Detailed ablation experiments demonstrate the agent's strong reliance on visual inputs, making it the first vision-based super-human car racing agent.
Navigating Occluded Intersections with Autonomous Vehicles using Deep Reinforcement Learning
Isele, David, Rahimi, Reza, Cosgun, Akansel, Subramanian, Kaushik, Fujimura, Kikuo
Providing an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate and have limited ability to generalize. We then explore a system's ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions.
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
Griffith, Shane, Subramanian, Kaushik, Scholz, Jonathan, Isbell, Charles L., Thomaz, Andrea L.
A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. State-of-the-art methods have approached this problem by mapping human information to reward and value signals to indicate preferences and then iterating over them to compute the necessary control policy. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct labels on the policy. We compare Advise to state-of-the-art approaches and highlight scenarios where it outperforms them and importantly is robust to infrequent and inconsistent human feedback.
Novel Interaction Strategies for Learning from Teleoperation
Akgun, Baris (Georgia Institute of Technology) | Subramanian, Kaushik (Georgia Institute of Technology) | Thomaz, Andrea Lockerd (Georgia Institute of Technology)
The field of robot Learning from Demonstration (LfD) makes use of several input modalities for demonstrations (teleoperation, kinesthetic teaching, marker- and vision-based motion tracking). In this paper we present two experiments aimed at identifying and overcoming challenges associated with using teleoperation as an input modality for LfD. Our first experiment compares kinesthetic teaching and teleoperation and highlights some inherent problems associated with teleoperation; specifically uncomfortable user interactions and inaccurate robot demonstrations. Our second experiment is focused on overcoming these problems and designing the teleoperation interaction to be more suitable for LfD. In previous work we have proposed a novel demonstration strategy using the concept of keyframes, where demonstrations are in the form of a discrete set of robot configurations. Keyframes can be naturally combined with continuous trajectory demonstrations to generate a hybrid strategy. We perform user studies to evaluate each of these demonstration strategies individually and show that keyframes are intuitive to the users and are particularly useful in providing noise-free demonstrations. We find that users prefer the hybrid strategy best for demonstrating tasks to a robot by teleoperation.
Learning Tasks and Skills Together From a Human Teacher
Akgun, Baris (Georgia Institute of Technology) | Subramanian, Kaushik (Georgia Institute of Technology) | Shim, Jaeeun (Georgia Institute of Technology) | Thomaz, Andrea Lockerd (Georgia Institute of Technology)
Robot Learning from Demonstration (LfD) research deals with the challenges of enabling humans to teach robots novel skills and tasks (Argall et al. 2009). The practical importance of LfD is due to the fact that it is impossible to pre-program all the necessary skills and task knowledge that a robot might need during its life-cycle. This poses many interesting application areas for LfD ranging from houses to factory floors. An important motivation for our research agenda is that in many of the practical LfD applications, the teacher will be an everyday end-user, not an expert in Machine Learning or robotics. Thus, our research explores the ways in which Machine Learning can exploit human social learning interactions--Socially Guided Machine Learning (SGML).
Task Space Behavior Learning for Humanoid Robots using Gaussian Mixture Models
Subramanian, Kaushik (Rutgers, The State University of New Jersey)
In this paper a system was developed for robot behavior acquisition using kinesthetic demonstrations. It enables a humanoid robot to imitate constrained reaching gestures directed towards a target using a learning algorithm based on Gaussian Mixture Models. The imitation trajectory can be reshaped in order to satisfy the constraints of the task and it can adapt to changes in the initial conditions and to target displacements occurring during movement execution. The potential of this method was evaluated using experiments with the Nao, Aldebaran’s humanoid robot.