Goto

Collaborating Authors

 deepmimic


Physics-Based Motion Imitation with Adversarial Differential Discriminators

Zhang, Ziyu, Bashkirov, Sergey, Yang, Dun, Shi, Yi, Taylor, Michael, Peng, Xue Bin

arXiv.org Artificial Intelligence

Multi-objective optimization problems, which require the simultaneous optimization of multiple objectives, are prevalent across numerous applications. Existing multi-objective optimization methods often rely on manually-tuned aggregation functions to formulate a joint optimization objective. The performance of such hand-tuned methods is heavily dependent on careful weight selection, a time-consuming and laborious process. These limitations also arise in the setting of reinforcement-learning-based motion tracking methods for physically simulated characters, where intricately crafted reward functions are typically used to achieve high-fidelity results. Such solutions not only require domain expertise and significant manual tuning, but also limit the applicability of the resulting reward function across diverse skills. To bridge this gap, we present a novel adversarial multi-objective optimization technique that is broadly applicable to a range of multi-objective reinforcement-learning tasks, including motion tracking. Our proposed Adversarial Differential Discriminator (ADD) receives a single positive sample, yet is still effective at guiding the optimization process. We demonstrate that our technique can enable characters to closely replicate a variety of acrobatic and agile behaviors, achieving comparable quality to state-of-the-art motion-tracking methods, without relying on manually-designed reward functions. Code and results are available at https://add-moo.github.io/.


We thank all reviewers for their constructive comments and are glad that our contributions are largely recognized

Neural Information Processing Systems

We thank all reviewers for their constructive comments and are glad that our contributions are largely recognized. Below, we address the reviewer's concerns point by point. A, we provide results of three MuJoCo manipulation examples: Pusher, Striker and Thrower . GAIL and GAIfO, our method is able to outperform all other LfO baselines. We thank the reviewer for the reminding.



DiffMimic: Efficient Motion Mimicking with Differentiable Physics

Ren, Jiawei, Yu, Cunjun, Chen, Siwei, Ma, Xiao, Pan, Liang, Liu, Ziwei

arXiv.org Artificial Intelligence

Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.


Efficient Hyperparameter Optimization for Physics-based Character Animation

Yang, Zeshi, Yin, Zhiqi

arXiv.org Artificial Intelligence

Physics-based character animation has seen significant advances in recent years with the adoption of Deep Reinforcement Learning (DRL). However, DRL-based learning methods are usually computationally expensive and their performance crucially depends on the choice of hyperparameters. Tuning hyperparameters for these methods often requires repetitive training of control policies, which is even more computationally prohibitive. In this work, we propose a novel Curriculum-based Multi-Fidelity Bayesian Optimization framework (CMFBO) for efficient hyperparameter optimization of DRL-based character control systems. Using curriculum-based task difficulty as fidelity criterion, our method improves searching efficiency by gradually pruning search space through evaluation on easier motor skill tasks. We evaluate our method on two physics-based character control tasks: character morphology optimization and hyperparameter tuning of DeepMimic. Our algorithm significantly outperforms state-of-the-art hyperparameter optimization methods applicable for physics-based character animation. In particular, we show that hyperparameters optimized through our algorithm result in at least 5x efficiency gain comparing to author-released settings in DeepMimic.


DeepMimic: Mentor-Student Unlabeled Data Based Training

Mosafi, Itay, David, Eli, Netanyahu, Nathan S.

arXiv.org Machine Learning

In this paper, we present a deep neural network (DNN) training approach called the "DeepMimic" training method. Enormous amounts of data are available nowadays for training usage. Yet, only a tiny portion of these data is manually labeled, whereas almost all of the data are unlabeled. The training approach presented utilizes, in a most simplified manner, the unlabeled data to the fullest, in order to achieve remarkable (classification) results. Our DeepMimic method uses a small portion of labeled data and a large amount of unlabeled data for the training process, as expected in a real-world scenario. It consists of a mentor model and a student model. Employing a mentor model trained on a small portion of the labeled data and then feeding it only with unlabeled data, we show how to obtain a (simplified) student model that reaches the same accuracy and loss as the mentor model, on the same test set, without using any of the original data labels in the training of the student model. Our experiments demonstrate that even on challenging classification tasks the student network architecture can be simplified significantly with a minor influence on the performance, i.e., we need not even know the original network architecture of the mentor. In addition, the time required for training the student model to reach the mentor's performance level is shorter, as a result of a simplified architecture and more available data. The proposed method highlights the disadvantages of regular supervised training and demonstrates the benefits of a less traditional training approach.


Video: Stunt Actors May Be Replaced By This A.I. Technology One Day Soon

#artificialintelligence

A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We show that well-known reinforcement learning (RL) methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals. Our method handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings, e.g., by walking in a desired direction or throwing a ball at a user-specified target. This approach thus combines the convenience and motion quality of using motion clips to define the desired style and appearance, with the flexibility and generality afforded by RL methods and physics-based animation. We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills. We demonstrate results using multiple characters (human, Atlas robot, bipedal dinosaur, dragon) and a large variety of skills, including locomotion, acrobatics, and martial arts.


AI stuntpeople could lead to more realistic video games

Engadget

Video game developers often turn to motion capture when they want realistic character animations. Mocap isn't very flexible, though, as it's hard to adapt a canned animation to different body shapes, unusual terrain or an interruption from another character. Researchers might have a better solution: teach the characters to fend for themselves. They've developed a deep learning engine (DeepMimic) that has characters learning to imitate reference mocap animations or even hand-animated keyframes, effectively training them to become virtual stunt actors. The AI promises realistic motion with the kind of flexibility that's difficult even with methods that blend scripted animations together.


Berkeley Researchers Create Virtual Acrobat – Synced – Medium

#artificialintelligence

The Berkeley Artificial Intelligence Research (BAIR) Lab yesterday proposed DeepMimic, a Reinforcement Learning (RL) technique that enables simulated characters to regenerate highly dynamic physical movements learned from data collected from human subjects. BAIR is a top-tier research lab focused on computer vision, machine learning, natural language processing, and robotics. RL methods have been shown to be applicable to a diverse suite of robotic tasks, particularly motion control problems. A typical RL includes a policy function that consists of all action selections that machines can do, and a value function that returns a low or high reward each time a machine takes an action. The epoch-making Go computer AlphaGo produced by DeepMind is grounded on the same technique.