controller
Robust Imitation of Diverse Behaviors
Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data, even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped and a 62 DoF 3D humanoid in the MuJoCo physics environment.
Data-Efficient Hierarchical Reinforcement Learning
Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control. For generality, we develop a scheme where lower-level controllers are supervised with goals that are learned and proposed automatically by the higher-level controllers. To address efficiency, we propose to use off-policy experience for both higher-and lower-level training.
Walmart's video game clearance sale drops popular titles for Switch, PS5, and Xbox by up to 50%
Gear Gaming Console Gaming Walmart's video game clearance sale drops popular titles for Switch, PS5, and Xbox by up to 50% Grab a copy of Stellar Blade, Lego Star Wars, the latest Assassin's Creed, or pretty much any game you've been waiting to buy for the lowest prices of the year. Never run out of stuff to play. We may earn revenue from the products available on this page and participate in affiliate programs. Walmart is running a big video game sale right now with discounts on Nintendo Switch, PlayStation 5, Xbox Series X, and accessories. There are nearly 100 deals live at the moment, with some of the best cuts landing on recent big-name releases -- is down to $37 (from $70), is $24.84 (down from $60), and is just $29.
- Asia > Middle East > Jordan (0.04)
- Asia > Japan (0.04)
- Retail (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
10 things to know about Apple's new M5 Pro and M5 Max MacBook Pros
Gear Computers 10 things to know about Apple's new M5 Pro and M5 Max MacBook Pros The latest versions of Apple's MacBook Pro laptops include M5 chips with revamped architecture to bring performance upgrades across the board. The new computers look similar, but the guts have gotten revamped. We may earn revenue from the products available on this page and participate in affiliate programs. Apple's latest MacBook Pro refresh landed today with two new processors, the M5 Pro and M5 Max, built on what the company calls its Fusion Architecture. We have already been using the vanilla M5 chip in the latest version of the Apple Vision Pro headset, but these new MBP models crank up the power level even more.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Heilongjiang Province > Daqing (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.93)
- Overview (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
A Proof A.1 Proof of Theorem 1 We leverage the results in [ 49
Lemma 3. Consider the ReLU activation The proof of Theorem 1 is given below. The inequality 3 uses strictly monotone property of p () . Code is available at this link. The neural networks are updated using Adam with learning rate initializes at 0.035 and All of them have no communication constraints. The training time is shown in Table 1.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)