body movement
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Synthesia's AI clones are more expressive than ever. Soon they'll be able to talk back.
When Synthesia launched in 2017, its primary purpose was to match AI versions of real human faces--for example, the former footballer David Beckham--with dubbed voices speaking in different languages. A few years later, in 2020, it started giving the companies that signed up for its services the opportunity to make professional-level presentation videos starring either AI versions of staff members or consenting actors. The avatars' body movements could be jerky and unnatural, their accents sometimes slipped, and the emotions indicated by their voices didn't always match their facial expressions. Now Synthesia's avatars have been updated with more natural mannerisms and movements, as well as expressive voices that better preserve the speaker's accent--making them appear more humanlike than ever before. For Synthesia's corporate clients, these avatars will make for slicker presenters of financial results, internal communications, or staff training videos.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation
Gan, Qijun, Yang, Ruizi, Zhu, Jianke, Xue, Shaofei, Hoi, Steven
Significant progress has been made in audio-driven human animation, while most existing methods focus mainly on facial movements, limiting their ability to create full-body animations with natural synchronization and fluidity. They also struggle with precise prompt control for fine-grained generation. To tackle these challenges, we introduce OmniAvatar, an innovative audio-driven full-body video generation model that enhances human animation with improved lip-sync accuracy and natural movements. OmniAvatar introduces a pixel-wise multi-hierarchical audio embedding strategy to better capture audio features in the latent space, enhancing lip-syncing across diverse scenes. To preserve the capability for prompt-driven control of foundation models while effectively incorporating audio features, we employ a LoRA-based training approach. Extensive experiments show that OmniAvatar surpasses existing models in both facial and semi-body video generation, offering precise text-based control for creating videos in various domains, such as podcasts, human interactions, dynamic scenes, and singing. Our project page is https://omni-avatar.github.io/.
- Information Technology > Graphics > Animation (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Human sensory-musculoskeletal modeling and control of whole-body movements
Zuo, Chenhui, Lin, Guohao, Zhang, Chen, Zhuang, Shanning, Sui, Yanan
Coordinated human movement depends on the integration of multisensory inputs, sensorimotor transformation, and motor execution, as well as sensory feedback resulting from body-environment interaction. Building dynamic models of the sensory-musculoskeletal system is essential for understanding movement control and investigating human behaviours. Here, we report a human sensory-musculoskeletal model, termed SMS-Human, that integrates precise anatomical representations of bones, joints, and muscle-tendon units with multimodal sensory inputs involving visual, vestibular, proprioceptive, and tactile components. A stage-wise hierarchical deep reinforcement learning framework was developed to address the inherent challenges of high-dimensional control in musculoskeletal systems with integrated multisensory information. Using this framework, we demonstrated the simulation of three representative movement tasks, including bipedal locomotion, vision-guided object manipulation, and human-machine interaction during bicycling. Our results showed a close resemblance between natural and simulated human motor behaviours. The simulation also revealed musculoskeletal dynamics that could not be directly measured. This work sheds deeper insights into the sensorimotor dynamics of human movements, facilitates quantitative understanding of human behaviours in interactive contexts, and informs the design of systems with embodied intelligence.
- Health & Medicine > Therapeutic Area > Musculoskeletal (0.56)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue
Ghaleb, Esam, Khaertdinov, Bulat, Özyürek, Aslı, Fernández, Raquel
In face-to-face interaction, we use multiple modalities, including speech and gestures, to communicate information and resolve references to objects. However, how representational co-speech gestures refer to objects remains understudied from a computational perspective. In this work, we address this gap by introducing a multimodal reference resolution task centred on representational gestures, while simultaneously tackling the challenge of learning robust gesture embeddings. We propose a self-supervised pre-training approach to gesture representation learning that grounds body movements in spoken language. Our experiments show that the learned embeddings align with expert annotations and have significant predictive power. Moreover, reference resolution accuracy further improves when (1) using multimodal gesture representations, even when speech is unavailable at inference time, and (2) leveraging dialogue history. Overall, our findings highlight the complementary roles of gesture and speech in reference resolution, offering a step towards more naturalistic models of human-machine interaction.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- (2 more...)
How Does it Sound?
One of the primary purposes of video is to capture people and their unique activities. It is often the case that the experience of watching the video can be enhanced by adding a musical soundtrack that is in-sync with the rhythmic features of these activities. Such a problem is challenging since little is known about capturing the rhythmic nature of free body movements. In this work, we explore this problem and propose a novel system, called RhythmicNet', which takes as an input a video which includes human movements and generates a soundtrack for it. RhythmicNet works directly with human movements by extracting skeleton keypoints and implements a sequence of models which translate the keypoints to rhythmic sounds.RhythmicNet follows the natural process of music improvisation which includes the prescription of streams of the beat, the rhythm and the melody.
SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms
Li, Shuzhen, Chen, Yuxin, Chen, Xuesong, Gao, Ruiyang, Zhang, Yupeng, Yu, Chao, Li, Yunfei, Ye, Ziyi, Huang, Weijun, Yi, Hongliang, Leng, Yue, Wu, Yi
Sleep monitoring plays a crucial role in maintaining good health, with sleep staging serving as an essential metric in the monitoring process. Traditional methods, utilizing medical sensors like EEG and ECG, can be effective but often present challenges such as unnatural user experience, complex deployment, and high costs. Ballistocardiography~(BCG), a type of piezoelectric sensor signal, offers a non-invasive, user-friendly, and easily deployable alternative for long-term home monitoring. However, reliable BCG-based sleep staging is challenging due to the limited sleep monitoring data available for BCG. A restricted training dataset prevents the model from generalization across populations. Additionally, transferring to BCG faces difficulty ensuring model robustness when migrating from other data sources. To address these issues, we introduce SleepNetZero, a zero-shot learning based approach for sleep staging. To tackle the generalization challenge, we propose a series of BCG feature extraction methods that align BCG components with corresponding respiratory, cardiac, and movement channels in PSG. This allows models to be trained on large-scale PSG datasets that are diverse in population. For the migration challenge, we employ data augmentation techniques, significantly enhancing generalizability. We conducted extensive training and testing on large datasets~(12393 records from 9637 different subjects), achieving an accuracy of 0.803 and a Cohen's Kappa of 0.718. ZeroSleepNet was also deployed in real prototype~(monitoring pads) and tested in actual hospital settings~(265 users), demonstrating an accuracy of 0.697 and a Cohen's Kappa of 0.589. To the best of our knowledge, this work represents the first known reliable BCG-based sleep staging effort and marks a significant step towards in-home health monitoring.
- North America > United States > California > San Francisco County > San Francisco (0.46)
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Shanghai > Shanghai (0.04)
- (6 more...)
Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception
Fieramosca, Federica, Rampa, Vittorio, D'Amico, Michele, Savazzi, Stefano
Abstract--Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations. The top view of this area is shown on the right as well.
- Africa > South Africa > Western Cape > Indian Ocean (0.27)
- Europe > Italy > Lombardy > Milan (0.04)