AITopics | body movement

Technology: Information Technology > Artificial Intelligence (0.38)

Neural Information Processing SystemsNov-16-2025, 02:58:21 GMT

f4e369c0a468d3aeeda0593ba90b5e55-Paper.pdf

One of the primary purposes of video is to capture people and their unique activities.

artificial intelligence, machine learning, natural language, (18 more...)

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

MIT Technology ReviewSep-4-2025, 10:05:33 GMT

Synthesia's AI clones are more expressive than ever. Soon they'll be able to talk back.

When Synthesia launched in 2017, its primary purpose was to match AI versions of real human faces--for example, the former footballer David Beckham--with dubbed voices speaking in different languages. A few years later, in 2020, it started giving the companies that signed up for its services the opportunity to make professional-level presentation videos starring either AI versions of staff members or consenting actors. The avatars' body movements could be jerky and unnatural, their accents sometimes slipped, and the emotions indicated by their voices didn't always match their facial expressions. Now Synthesia's avatars have been updated with more natural mannerisms and movements, as well as expressive voices that better preserve the speaker's accent--making them appear more humanlike than ever before. For Synthesia's corporate clients, these avatars will make for slicker presenters of financial results, internal communications, or staff training videos.

artificial intelligence, machine learning, synthesia, (9 more...)

MIT Technology Review

Industry: Leisure & Entertainment > Sports > Soccer (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Vision (0.71)

Neural Information Processing SystemsAug-18-2025, 21:33:16 GMT

f4e369c0a468d3aeeda0593ba90b5e55-Paper.pdf

artificial intelligence, machine learning, natural language, (18 more...)

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
(2 more...)

arXiv.org Artificial IntelligenceJun-24-2025

OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation

Gan, Qijun, Yang, Ruizi, Zhu, Jianke, Xue, Shaofei, Hoi, Steven

Significant progress has been made in audio-driven human animation, while most existing methods focus mainly on facial movements, limiting their ability to create full-body animations with natural synchronization and fluidity. They also struggle with precise prompt control for fine-grained generation. To tackle these challenges, we introduce OmniAvatar, an innovative audio-driven full-body video generation model that enhances human animation with improved lip-sync accuracy and natural movements. OmniAvatar introduces a pixel-wise multi-hierarchical audio embedding strategy to better capture audio features in the latent space, enhancing lip-syncing across diverse scenes. To preserve the capability for prompt-driven control of foundation models while effectively incorporating audio features, we employ a LoRA-based training approach. Extensive experiments show that OmniAvatar surpasses existing models in both facial and semi-body video generation, offering precise text-based control for creating videos in various domains, such as podcasts, human interactions, dynamic scenes, and singing. Our project page is https://omni-avatar.github.io/.

arxiv preprint arxiv, machine learning, natural language, (14 more...)

2506.18866

Genre: Research Report (1.00)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceJun-3-2025

Human sensory-musculoskeletal modeling and control of whole-body movements

Zuo, Chenhui, Lin, Guohao, Zhang, Chen, Zhuang, Shanning, Sui, Yanan

Coordinated human movement depends on the integration of multisensory inputs, sensorimotor transformation, and motor execution, as well as sensory feedback resulting from body-environment interaction. Building dynamic models of the sensory-musculoskeletal system is essential for understanding movement control and investigating human behaviours. Here, we report a human sensory-musculoskeletal model, termed SMS-Human, that integrates precise anatomical representations of bones, joints, and muscle-tendon units with multimodal sensory inputs involving visual, vestibular, proprioceptive, and tactile components. A stage-wise hierarchical deep reinforcement learning framework was developed to address the inherent challenges of high-dimensional control in musculoskeletal systems with integrated multisensory information. Using this framework, we demonstrated the simulation of three representative movement tasks, including bipedal locomotion, vision-guided object manipulation, and human-machine interaction during bicycling. Our results showed a close resemblance between natural and simulated human motor behaviours. The simulation also revealed musculoskeletal dynamics that could not be directly measured. This work sheds deeper insights into the sensorimotor dynamics of human movements, facilitates quantitative understanding of human behaviours in interactive contexts, and informs the design of systems with embodied intelligence.

machine learning, reinforcement learning, tendon unit, (18 more...)

2506.00071

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (0.56)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Ghaleb, Esam, Khaertdinov, Bulat, Özyürek, Aslı, Fernández, Raquel

I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue

arXiv.org Artificial IntelligenceFeb-27-2025

In face-to-face interaction, we use multiple modalities, including speech and gestures, to communicate information and resolve references to objects. However, how representational co-speech gestures refer to objects remains understudied from a computational perspective. In this work, we address this gap by introducing a multimodal reference resolution task centred on representational gestures, while simultaneously tackling the challenge of learning robust gesture embeddings. We propose a self-supervised pre-training approach to gesture representation learning that grounds body movements in spoken language. Our experiments show that the learned embeddings align with expert annotations and have significant predictive power. Moreover, reference resolution accuracy further improves when (1) using multimodal gesture representations, even when speech is unavailable at inference time, and (2) leveraging dialogue history. Overall, our findings highlight the complementary roles of gesture and speech in reference resolution, offering a step towards more naturalistic models of human-machine interaction.

artificial intelligence, machine learning, natural language, (19 more...)

2503.00071

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.35)

Neural Information Processing SystemsJan-19-2025, 14:05:38 GMT

How Does it Sound?

One of the primary purposes of video is to capture people and their unique activities. It is often the case that the experience of watching the video can be enhanced by adding a musical soundtrack that is in-sync with the rhythmic features of these activities. Such a problem is challenging since little is known about capturing the rhythmic nature of free body movements. In this work, we explore this problem and propose a novel system, called RhythmicNet', which takes as an input a video which includes human movements and generates a soundtrack for it. RhythmicNet works directly with human movements by extracting skeleton keypoints and implements a sequence of models which translate the keypoints to rhythmic sounds.RhythmicNet follows the natural process of music improvisation which includes the prescription of streams of the beat, the rhythm and the melody.

artificial intelligence, soundtrack, video, (5 more...)

Technology: Information Technology > Artificial Intelligence (0.61)

arXiv.org Artificial IntelligenceOct-29-2024

SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms

Li, Shuzhen, Chen, Yuxin, Chen, Xuesong, Gao, Ruiyang, Zhang, Yupeng, Yu, Chao, Li, Yunfei, Ye, Ziyi, Huang, Weijun, Yi, Hongliang, Leng, Yue, Wu, Yi

Sleep monitoring plays a crucial role in maintaining good health, with sleep staging serving as an essential metric in the monitoring process. Traditional methods, utilizing medical sensors like EEG and ECG, can be effective but often present challenges such as unnatural user experience, complex deployment, and high costs. Ballistocardiography~(BCG), a type of piezoelectric sensor signal, offers a non-invasive, user-friendly, and easily deployable alternative for long-term home monitoring. However, reliable BCG-based sleep staging is challenging due to the limited sleep monitoring data available for BCG. A restricted training dataset prevents the model from generalization across populations. Additionally, transferring to BCG faces difficulty ensuring model robustness when migrating from other data sources. To address these issues, we introduce SleepNetZero, a zero-shot learning based approach for sleep staging. To tackle the generalization challenge, we propose a series of BCG feature extraction methods that align BCG components with corresponding respiratory, cardiac, and movement channels in PSG. This allows models to be trained on large-scale PSG datasets that are diverse in population. For the migration challenge, we employ data augmentation techniques, significantly enhancing generalizability. We conducted extensive training and testing on large datasets~(12393 records from 9637 different subjects), achieving an accuracy of 0.803 and a Cohen's Kappa of 0.718. ZeroSleepNet was also deployed in real prototype~(monitoring pads) and tested in actual hospital settings~(265 users), demonstrating an accuracy of 0.697 and a Cohen's Kappa of 0.589. To the best of our knowledge, this work represents the first known reliable BCG-based sleep staging effort and marks a significant step towards in-home health monitoring.

artificial intelligence, deep learning, machine learning, (16 more...)

2410.22646

Country:

North America > United States > California > San Francisco County > San Francisco (0.46)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Shanghai > Shanghai (0.04)
(6 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Sleep (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fieramosca, Federica, Rampa, Vittorio, D'Amico, Michele, Savazzi, Stefano

Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception

arXiv.org Artificial IntelligenceMay-15-2024

Abstract--Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations. The top view of this area is shown on the right as well.

array response, artificial intelligence, machine learning, (14 more...)

doi: 10.23919/EuCAP60739.2024.10501077

2405.02131

Country:

Africa > South Africa > Western Cape > Indian Ocean (0.27)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.70)

Industry: Media (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)