robot
A More Backgrounds
A.1 Distributional RL Distributional RL [2, 3, 8] is an area of RL that considers the distribution of the cumulative return Z In this paper, we estimate the quantiles of the cumulative sum cost using the quantile loss, and use them to solve the constrained optimization problem (QuantCP). A.3 The Considered Constrained Problems In this subsection, we list the problems for constrained RL. The first constrained problem is a common problem used in many previous constrained RL papers. Note that the CVaR and the quantile are two different measures for undesirable events, and the choice between the two depends on what we desire. For example, an insurance company prefers the CVaR of undesirable events to determine an insurance premium.
A Environment Details
Our unsupervised pre-training algorithm is provided in Algorithm 1. We assume that the pre-training environment provides access to both proprioceptive states (the input of the skill policy) and goal state features as defined in Appendix B. During training, goal spaces and goals are randomly selected for each episode The low-level skill policy
Neural Hybrid Automata Supplementary Material 14 A.1 Neural Hybrid Automata: Modules and Hyperparameters 14 A.2 Gradient Pathologies
A.1 Neural Hybrid Automata: Modules and Hyperparameters We provide a notation and summary table for Neural Hybrid Automata (NHA). The table serves as a quick reference for the core concepts introduced in the main text. The only NHA hyperparameter beyond module architectural choices is m, or number of latent modes provided to the model at initialization. Performance effects of changing m have been explored in Section 5.2 and Appendix B.2. Appendix B.2 further provides analyzes potential techniques to prune additional modes. A.2 Gradient Pathologies We provide some theoretical insights on the phenomenon of gradient pathologies with the simple example of a one-dimensional linear hybrid system with two modes and one timed jump, { ax This, in turn, affects the gradients for b, which results different than 0 despite the fact that b, from (A.1) should not be affecting the solution at points t In nonlinear systems with multiple events (including stochasticity) these effects can have a great empirical effect on a training procedure.
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
Language instructions and demonstrations are two natural ways for users to teach robots personalized tasks. Recent progress in Large Language Models (LLMs) has shown impressive performance in translating language instructions into code for robotic tasks. However, translating demonstrations into task code continues to be a challenge due to the length and complexity of both demonstrations and code, making learning a direct mapping intractable. This paper presents Demo2Code, a novel framework that generates robot task code from demonstrations via an extended chain-of-thought and defines a common latent specification to connect the two. Our framework employs a robust two-stage process: (1) a recursive summarization technique that condenses demonstrations into concise specifications, and (2) a code synthesis approach that expands each function recursively from the generated specifications. We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment.
Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
Synchronizing decisions across multiple agents in realistic settings is problematic since it requires agents to wait for other agents to terminate and communicate about termination reliably. Ideally, agents should learn and execute asynchronously instead. Such asynchronous methods also allow temporally extended actions that can take different amounts of time based on the situation and action executed. Unfortunately, current policy gradient methods are not applicable in asynchronous settings, as they assume that agents synchronously reason about action selection at every time step. To allow asynchronous learning and decision-making, we formulate a set of asynchronous multi-agent actor-critic methods that allow agents to directly optimize asynchronous policies in three standard training paradigms: decentralized learning, centralized learning, and centralized training for decentralized execution. Empirical results (in simulation and hardware) in a variety of realistic domains demonstrate the superiority of our approaches in large multi-agent problems and validate the effectiveness of our algorithms for learning high-quality and asynchronous solutions.
Segway wants to upgrade your smart home with a smarter yardโat a huge discount
You likely have a smart assistant on your phone, a robot vacuum cleaner that sweeps and mops your floors, smart switches that turn off the lights without you even getting out of bed, a smart speaker that instructs your go-to assistant to play your favorite songs, and so on. Why don't you also have a smart robot lawn mower for your yard? The Segway Navimow i Series could be the missing link in your smart home setup. Segway's smart robot lawnmowers can be a great addition to any house, freeing up countless hours spent mowing for more pleasurable activities, such as spending time by the pool, having fun with the family, or just enjoying your hobbies. By adding a Navimow to your smart home setup, you'll get a perfect lawn without any of the effort usually required.