AITopics | body motion

Collaborating Authors

body motion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios

Ho, Leo, Huang, Yinghao, Qin, Dafei, Shi, Mingyi, Tse, Wangpok, Liu, Wei, Yamagishi, Junichi, Komura, Taku

arXiv.org Artificial IntelligenceSep-9-2025

We address the problem of accurate capture of interactive behaviors between two people in daily scenarios. Most previous works either only consider one person or solely focus on conversational gestures of two people, assuming the body orientation and/or position of each actor are constant or barely change over each interaction. In contrast, we propose to simultaneously model two people's activities, and target objective-driven, dynamic, and semantically consistent interactions which often span longer duration and cover bigger space. To this end, we capture a new multi-modal dataset dubbed InterAct, which is composed of 241 motion sequences where two people perform a realistic and coherent scenario for one minute or longer over a complete interaction. For each sequence, two actors are assigned different roles and emotion labels, and collaborate to finish one task or conduct a common interaction activity. The audios, body motions, and facial expressions of both persons are captured. InterAct contains diverse and complex motions of individuals and interesting and relatively long-term interaction patterns barely seen before. We also demonstrate a simple yet effective diffusion-based method that estimates interactive face expressions and body motions of two people from speech inputs. Our method regresses the body motions in a hierarchical manner, and we also propose a novel fine-tuning mechanism to improve the lip accuracy of facial expressions. To facilitate further research, the data and code is made available at https://hku-cg.github.io/interact/ .

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3747871

2509.05747

Country:

North America > United States (0.67)
Asia > Japan > Honshū (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.70)

Add feedback

MVRS: The Multimodal Virtual Reality Stimuli-based Emotion Recognition Dataset

Mousavi, Seyed Muhammad Hossein, Ilanloo, Atiye

arXiv.org Artificial IntelligenceSep-9-2025

Automatic emotion recognition has become increasingly important with the rise of AI, especially in fields like healthcare, education, and automotive systems. However, there is a lack of multimodal datasets, particularly involving body motion and physiological signals, which limits progress in the field. To address this, the MVRS dataset is introduced, featuring synchronized recordings from 13 participants aged 12 to 60 exposed to VR based emotional stimuli (relaxation, fear, stress, sadness, joy). Data were collected using eye tracking (via webcam in a VR headset), body motion (Kinect v2), and EMG and GSR signals (Arduino UNO), all timestamp aligned. Participants followed a unified protocol with consent and questionnaires. Features from each modality were extracted, fused using early and late fusion techniques, and evaluated with classifiers to confirm the datasets quality and emotion separability, making MVRS a valuable contribution to multimodal affective computing.

artificial intelligence, human computer interaction, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.0533

Country:

Asia (0.28)
Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.93)
(2 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
(2 more...)

Add feedback

Using Visual Anomaly Detection for Task Execution Monitoring

Thoduka, Santosh, Gall, Juergen, Plöger, Paul G.

arXiv.org Artificial IntelligenceAug-26-2025

Execution monitoring is essential for robots to detect and respond to failures. Since it is impossible to enumerate all failures for a given task, we learn from successful executions of the task to detect visual anomalies during runtime. Our method learns to predict the motions that occur during the nominal execution of a task, including camera and robot body motion. A probabilistic U-Net architecture is used to learn to predict optical flow, and the robot's kinematics and 3D model are used to model camera and body motion. The errors between the observed and predicted motion are used to calculate an anomaly score. We evaluate our method on a dataset of a robot placing a book on a shelf, which includes anomalies such as falling books, camera occlusions, and robot disturbances. We find that modeling camera and body motion, in addition to the learning-based optical flow prediction, results in an improvement of the area under the receiver operating characteristic curve from 0.752 to 0.804, and the area under the precision-recall curve from 0.467 to 0.549.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS51168.2021.9636133

2107.14206

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Synthetic Data Generation of Body Motion Data by Neural Gas Network for Emotion Recognition

Mousavi, Seyed Muhammad Hossein

arXiv.org Artificial IntelligenceMar-11-2025

In the domain of emotion recognition using body motion, the primary challenge lies in the scarcity of diverse and generalizable datasets. Automatic emotion recognition uses machine learning and artificial intelligence techniques to recognize a person's emotional state from various data types, such as text, images, sound, and body motion. Body motion poses unique challenges as many factors, such as age, gender, ethnicity, personality, and illness, affect its appearance, leading to a lack of diverse and robust datasets specifically for emotion recognition. To address this, employing Synthetic Data Generation (SDG) methods, such as Generative Adversarial Networks (GANs) and Variational Auto Encoders (VAEs), offers potential solutions, though these methods are often complex. This research introduces a novel application of the Neural Gas Network (NGN) algorithm for synthesizing body motion data and optimizing diversity and generation speed. By learning skeletal structure topology, the NGN fits the neurons or gas particles on body joints. Generated gas particles, which form the skeletal structure later on, will be used to synthesize the new body posture. By attaching body postures over frames, the final synthetic body motion appears. We compared our generated dataset against others generated by GANs, VAEs, and another benchmark algorithm, using benchmark metrics such as Fr\'echet Inception Distance (FID), Diversity, and a few more. Furthermore, we continued evaluation using classification metrics such as accuracy, precision, recall, and a few others. Joint-related features or kinematic parameters were extracted, and the system assessed model performance against unseen data. Our findings demonstrate that the NGN algorithm produces more realistic and emotionally distinct body motion data and does so with more synthesizing speed than existing methods.

body motion, emotion recognition, motion data, (12 more...)

arXiv.org Artificial Intelligence

2503.14513

Country:

Europe > Switzerland (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

Jiang, Jianping, Xiao, Weiye, Lin, Zhengyu, Zhang, Huaizhong, Ren, Tianxiang, Gao, Yang, Lin, Zhiqian, Cai, Zhongang, Yang, Lei, Liu, Ziwei

arXiv.org Artificial IntelligenceNov-29-2024

Human beings are social animals. How to equip 3D autonomous characters with similar social intelligence that can perceive, understand and interact with humans remains an open yet foundamental problem. In this paper, we introduce SOLAMI, the first end-to-end Social vision-Language-Action (VLA) Modeling framework for Immersive interaction with 3D autonomous characters. Specifically, SOLAMI builds 3D autonomous characters from three aspects: (1) Social VLA Architecture: We propose a unified social VLA framework to generate multimodal response (speech and motion) based on the user's multimodal input to drive the character for social interaction. (2) Interactive Multimodal Data: We present SynMSI, a synthetic multimodal social interaction dataset generated by an automatic pipeline using only existing motion datasets to address the issue of data scarcity. (3) Immersive VR Interface: We develop a VR interface that enables users to immersively interact with these characters driven by various architectures. Extensive quantitative experiments and user studies demonstrate that our framework leads to more precise and natural character responses (in both speech and motion) that align with user expectations with lower latency.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2412.00174

Country:

North America > United States (0.14)
North America > Montserrat (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

Add feedback

Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Jiang, Zhenyu, Xie, Yuqi, Li, Jinhan, Yuan, Ye, Zhu, Yifeng, Zhu, Yuke

arXiv.org Artificial IntelligenceOct-16-2024

Humanoid robots, with their human-like embodiment, have the potential to integrate seamlessly into human environments. Critical to their coexistence and cooperation with humans is the ability to understand natural language communications and exhibit human-like behaviors. This work focuses on generating diverse whole-body motions for humanoid robots from language descriptions. We leverage human motion priors from extensive human motion datasets to initialize humanoid motions and employ the commonsense reasoning capabilities of Vision Language Models (VLMs) to edit and refine these motions. Our approach demonstrates the capability to produce natural, expressive, and text-aligned humanoid motions, validated through both simulated and real-world experiments. More videos can be found at https://ut-austin-rpl.github.io/Harmon/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.12773

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Africa > Mali (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Locality-based Neural Solver for Optical Motion Capture

Pan, Xiaoyu, Zheng, Bowen, Jiang, Xinwei, Xu, Guanglong, Gu, Xianli, Li, Jingxiang, Kou, Qilong, Wang, He, Shao, Tianjia, Zhou, Kun, Jin, Xiaogang

arXiv.org Artificial IntelligenceSep-4-2023

We present a novel locality-based learning method for cleaning and solving optical motion capture data. Given noisy marker data, we propose a new heterogeneous graph neural network which treats markers and joints as different types of nodes, and uses graph convolution operations to extract the local features of markers and joints and transform them to clean motions. To deal with anomaly markers (e.g. occluded or with big tracking errors), the key insight is that a marker's motion shows strong correlations with the motions of its immediate neighboring markers but less so with other markers, a.k.a. locality, which enables us to efficiently fill missing markers (e.g. due to occlusion). Additionally, we also identify marker outliers due to tracking errors by investigating their acceleration profiles. Finally, we propose a training regime based on representation learning and data augmentation, by training the model on data with masking. The masking schemes aim to mimic the occluded and noisy markers often observed in the real data. Finally, we show that our method achieves high accuracy on multiple metrics across various datasets. Extensive comparison shows our method outperforms state-of-the-art methods in terms of prediction accuracy of occluded marker position error by approximately 20%, which leads to a further error reduction on the reconstructed joint rotations and positions by 30%. The code and data for this paper are available at https://github.com/non-void/LocalMoCap.

body motion, hand motion, occlusion, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3610548.3618148

2309.00428

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.74)

Add feedback

Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in A Triadic Interaction

Joo, Hanbyul, Simon, Tomas, Cikara, Mina, Sheikh, Yaser

arXiv.org Artificial IntelligenceJun-10-2019

We present a new research task and a dataset to understand human social interactions via computational methods, to ultimately endow machines with the ability to encode and decode a broad channel of social signals humans use. This research direction is essential to make a machine that genuinely communicates with humans, which we call Social Artificial Intelligence. We first formulate the "social signal prediction" problem as a way to model the dynamics of social signals exchanged among interacting individuals in a data-driven way. We then present a new 3D motion capture dataset to explore this problem, where the broad spectrum of social signals (3D body, face, and hand motions) are captured in a triadic social interaction scenario. Baseline approaches to predict speaking status, social formation, and body gestures of interacting individuals are presented in the defined social prediction framework.

artificial intelligence, machine learning, target person, (19 more...)

arXiv.org Artificial Intelligence

1906.04158

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Norway > Norwegian Sea (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Need a hand? Bizarre backpack give you an extra set of arms that can be controlled remotely

Daily Mail - Science & techSep-10-2018, 17:33:12 GMT

Getting an extra pair of arms is now as easy as putting on a backpack. Researchers at the University of Tokyo and Keio University created a telepresence robotic system called'Fusion' that has a head and two arms that are controlled remotely. The remote person not only sees what the wearer sees, but they can also control the robot's arms, or even use the robot to manipulate the wearer's arms. A telepresence robot is a remote-controlled device that typically moves around using a set of wheels. They've been adapted for a variety of uses, from communications tools for consumers or even for use by businesses.

artificial intelligence, fusion, robot, (16 more...)

Daily Mail - Science & tech

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.25)
Asia > Singapore (0.05)

Genre: Research Report (0.36)

Industry:

Information Technology (0.52)
Leisure & Entertainment (0.32)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

New advance with conversational robots

#artificialintelligenceAug-4-2018, 16:11:29 GMT

The new research stems from a symbiotic human-robot interaction project from the Japan Science and Technology Agency. This is based around the development of a multimodal conversation control system together with a multi-robot conversation control system. These have been designed to create a robot that possess a much higher degree of human-like presence than any comparable robot today. A secondary aim is to create a robot with a'sense of conversing'. This outcome of the project is to design a new generation of conversational robots, starting with a child-like android dubbed'ibuki'.

artificial intelligence, conversational robot, robot, (5 more...)

#artificialintelligence

Country: Asia > Japan (0.27)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback