AITopics | mediapipe

Collaborating Authors

mediapipe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reconnaissance Automatique des Langues des Signes : Une Approche Hybridée CNN-LSTM Basée sur Mediapipe

Takouchouang, Fraisse Sacré, Vinh, Ho Tuong

arXiv.org Artificial IntelligenceOct-28-2025

Sign languages play a crucial role in the communication of deaf communities, but they are often marginalized, limiting access to essential services such as healthcare and education. This study proposes an automatic sign language recognition system based on a hybrid CNN-LSTM architecture, using Mediapipe for gesture keypoint extraction. Developed with Python, TensorFlow and Streamlit, the system provides real-time gesture translation. The results show an average accuracy of 92\%, with very good performance for distinct gestures such as ``Hello'' and ``Thank you''. However, some confusions remain for visually similar gestures, such as ``Call'' and ``Yes''. This work opens up interesting perspectives for applications in various fields such as healthcare, education and public services.

artificial intelligence, langue, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.22011

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hand Gesture Recognition for Collaborative Robots Using Lightweight Deep Learning in Real-Time Robotic Systems

Muhtadin, null, Darmawan, I Wayan Agus, Rusydiansyah, Muhammad Hilmi, Purnama, I Ketut Eddy, Fatichah, Chastine, Purnomo, Mauridhi Hery

arXiv.org Artificial IntelligenceSep-19-2025

Direct and natural interaction is essential for intuitive human-robot collaboration, eliminating the need for additional devices such as joysticks, tablets, or wearable sensors. In this paper, we present a lightweight deep learning-based hand gesture recognition system that enables humans to control collaborative robots naturally and efficiently. This model recognizes eight distinct hand gestures with only 1,103 parameters and a compact size of 22 KB, achieving an accuracy of 93.5%. To further optimize the model for real-world deployment on edge devices, we applied quantization and pruning using TensorFlow Lite, reducing the final model size to just 7 KB. The system was successfully implemented and tested on a Universal Robot UR5 collaborative robot within a real-time robotic framework based on ROS2. The results demonstrate that even extremely lightweight models can deliver accurate and responsive hand gesture-based control for collaborative robots, opening new possibilities for natural human-robot interaction in constrained environments.

artificial intelligence, machine learning, robot, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISITIA66279.2025.11137561

2507.10055

Country: Asia > Indonesia (0.29)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluation of facial landmark localization performance in a surgical setting

Frajtag, Ines, Švaco, Marko, Šuligoj, Filip

arXiv.org Artificial IntelligenceJul-25-2025

The use of robotics, computer vision, and their applications is becoming increasingly widespread in various fields, including medicine. Many face detection algorithms have found applications in neurosurgery, ophthalmology, and plastic surgery. A common challenge in using these algorithms is variable lighting conditions and the flexibility of detection positions to identify and precisely localize patients. The proposed experiment tests the MediaPipe algorithm for detecting facial landmarks in a controlled setting, using a robotic arm that automatically adjusts positions while the surgical light and the phantom remain in a fixed position. The results of this study demonstrate that the improved accuracy of facial landmark detection under surgical lighting significantly enhances the detection performance at larger yaw and pitch angles. The increase in standard deviation/dispersion occurs due to imprecise detection of selected facial landmarks. This analysis allows for a discussion on the potential integration of the MediaPipe algorithm into medical procedures.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.18248

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TSLFormer: A Lightweight Transformer Model for Turkish Sign Language Recognition Using Skeletal Landmarks

Ertürk, Kutay, Altınışık, Furkan, Sarıaltın, İrem, Gerek, Ömer Nezih

arXiv.org Artificial IntelligenceJun-19-2025

--This study presents TSLFormer, a light and robust word-level T urkish Sign Language (TID) recognition model that treats sign gestures as ordered, string-like language. In contrast to working with raw RGB or depth videos, our method only works with 3D joint positions--articulation points--extracted using Google's Mediapipe library, which focuses on the hand and torso skeletal locations. This creates efficient input dimensionality reduction with significant preservation of important semantic information of the gesture. Our approach revisits sign language recognition as sequence-to-sequence translation, drawing inspiration from sign languages' linguistic nature and transformer's success at natural language translation. Since TSLFormer adapts the transformers' self-attention mechanism, it effectively represents the temporal co-occurrence of a sign sequence, stressing significant movement habits over time as words are referenced in a sentence. Experimented and validated on the AUTSL dataset holding over 36,000 sign samples of over 226 different words, the TSLFormer achieves competitive performance and with minimal computational demands. From the experimentation, rich spatiotemporal understanding of signs is evidenced, and using only joint landmarks, it is possible within any real-time, mobile, and assistive technology facilitating communication between hearing-impaired members. Sign language is an essential communication method for the hearing impaired to express ideas and sentiments through hand gestures, facial expressions, and body movement. Unlike spoken languages, which employ auditory and verbal modalities, sign language utilizes visual and spatial modalities to express meaning. However, despite the limited number of sign language proficient individuals, communication gaps still exist to hinder inclusion--particularly in social interaction on a daily basis and in employment, educational, and healthcare environments.

machine learning, natural language, recognition, (13 more...)

arXiv.org Artificial Intelligence

2505.0789

Country: Asia > Middle East > Republic of Türkiye (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Real-Time Imitation of Human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach

Rayati, Keyhan, Feizi, Amirhossein, Beigy, Alireza, Shahverdi, Pourya, Masouleh, Mehdi Tale, Kalhor, Ahmad

arXiv.org Artificial IntelligenceApr-29-2025

--This paper introduces a novel approach for enabling real-time imitation of human head motion by a Nao robot, with a primary focus on elevating human-robot interactions. By using the robust capabilities of the MediaPipe as a computer vision library and the DeepFace as an emotion recognition library, this research endeavors to capture the subtleties of human head motion, including blink actions and emotional expressions, and seamlessly incorporate these indicators into the robot's responses. The result is a comprehensive framework which facilitates precise head imitation within human-robot interactions, utilizing a closed-loop approach that involves gathering real-time feedback from the robot's imitation performance. This feedback loop ensures a high degree of accuracy in modeling head motion, as evidenced by an impressive R2 score of 96.3 for pitch and 98.9 for yaw. Notably, the proposed approach holds promise in improving communication for children with autism, offering them a valuable tool for more effective interaction. In essence, proposed work explores the integration of real-time head imitation and real-time emotion recognition to enhance human-robot interactions, with potential benefits for individuals with unique communication needs. The field of robotics has come a long way in recent years, with significant advancements in the development of humanoid robots.

artificial intelligence, machine learning, robot, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRoM60803.2023.10412471

2504.19985

Country:

Asia > Middle East > Iran (0.16)
North America > United States (0.14)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.61)
Health & Medicine > Therapeutic Area (0.55)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.99)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

A Real-Time Gesture-Based Control Framework

Khazaei, Mahya, Bahrani, Ali, Tzanetakis, George

arXiv.org Artificial IntelligenceApr-29-2025

We introduce a real-time, human-in-the-loop gesture control framework that can dynamically adapt audio and music based on human movement by analyzing live video input. By creating a responsive connection between visual and auditory stimuli, this system enables dancers and performers to not only respond to music but also influence it through their movements. Designed for live performances, interactive installations, and personal use, it offers an immersive experience where users can shape the music in real time. The framework integrates computer vision and machine learning techniques to track and interpret motion, allowing users to manipulate audio elements such as tempo, pitch, effects, and playback sequence. With ongoing training, it achieves user-independent functionality, requiring as few as 50 to 80 samples to label simple gestures. This framework combines gesture training, cue mapping, and audio manipulation to create a dynamic, interactive experience. Gestures are interpreted as input signals, mapped to sound control commands, and used to naturally adjust music elements, showcasing the seamless interplay between human interaction and machine response.

artificial intelligence, machine learning, real time system, (21 more...)

arXiv.org Artificial Intelligence

2504.1946

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.93)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Pose-Based Fall Detection System: Efficient Monitoring on Standard CPUs

Mali, Vinayak, Jaiswal, Saurabh

arXiv.org Artificial IntelligenceMar-25-2025

Falls among elderly residents in assisted living homes pose significant health risks, often leading to injuries and a decreased quality of life. Current fall detection solutions typically rely on sensor-based systems that require dedicated hardware, or on video-based models that demand high computational resources and GPUs for real-time processing. In contrast, this paper presents a robust fall detection system that does not require any additional sensors or high-powered hardware. The system uses pose estimation techniques, combined with threshold-based analysis and a voting mechanism, to effectively distinguish between fall and non-fall activities. For pose detection, we leverage MediaPipe, a lightweight and efficient framework that enables real-time processing on standard CPUs with minimal computational overhead. By analyzing motion, body position, and key pose points, the system processes pose features with a 20-frame buffer, minimizing false positives and maintaining high accuracy even in real-world settings. This unobtrusive, resource-efficient approach provides a practical solution for enhancing resident safety in old age homes, without the need for expensive sensors or high-end computational resources.

artificial intelligence, fall detection system, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.19501

Country:

Asia > India > West Bengal > Kharagpur (0.05)
Africa > Mali (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Consumer Health (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.37)

Add feedback

Emotion estimation from video footage with LSTM

Attrah, Samer

arXiv.org Artificial IntelligenceFeb-10-2025

Emotion estimation in general is a field that has been studied for a long time, and several approaches exist using machine learning. in this paper, we present an LSTM model, that processes the blend-shapes produced by the library MediaPipe, for a face detected in a live stream of a camera, to estimate the main emotion from the facial expressions, this model is trained on the FER2013 dataset and delivers a result of 71% accuracy and 62% f1-score which meets the accuracy benchmark of the FER2013 dataset, with significantly reduced computation costs. https://github.com/Samir-atra/Emotion_estimation_from_video_footage_with_LSTM_ML_algorithm

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.13432

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

R2I-rPPG: A Robust Region of Interest Selection Method for Remote Photoplethysmography to Extract Heart Rate

Nagar, Sandeep, Hasegawa-Johnson, Mark, Beiser, David G., Ahuja, Narendra

arXiv.org Artificial IntelligenceNov-25-2024

The COVID-19 pandemic has underscored the need for low-cost, scalable approaches to measuring contactless vital signs, either during initial triage at a healthcare facility or virtual telemedicine visits. Remote photoplethysmography (rPPG) can accurately estimate heart rate (HR) when applied to close-up videos of healthy volunteers in well-lit laboratory settings. However, results from such highly optimized laboratory studies may not be readily translated to healthcare settings. One significant barrier to the practical application of rPPG in health care is the accurate localization of the region of interest (ROI). Clinical or telemedicine visits may involve sub-optimal lighting, movement artifacts, variable camera angle, and subject distance. This paper presents an rPPG ROI selection method based on 3D facial landmarks and patient head yaw angle. We then demonstrate the robustness of this ROI selection method when coupled to the Plane-Orthogonal-to-Skin (POS) rPPG method when applied to videos of patients presenting to an Emergency Department for respiratory complaints. Our results demonstrate the effectiveness of our proposed approach in improving the accuracy and robustness of rPPG in a challenging clinical environment.

dataset, extraction, landmark, (13 more...)

arXiv.org Artificial Intelligence

2410.15851

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.47)

Add feedback

Allo-AVA: A Large-Scale Multimodal Conversational AI Dataset for Allocentric Avatar Gesture Animation

Punjwani, Saif, Heck, Larry

arXiv.org Artificial IntelligenceOct-21-2024

The scarcity of high-quality, multimodal training data severely hinders the creation of lifelike avatar animations for conversational AI in virtual environments. Existing datasets often lack the intricate synchronization between speech, facial expressions, and body movements that characterize natural human communication. To address this critical gap, we introduce Allo-AVA, a large-scale dataset specifically designed for text and audio-driven avatar gesture animation in an allocentric (third person point-of-view) context. Allo-AVA consists of $\sim$1,250 hours of diverse video content, complete with audio, transcripts, and extracted keypoints. Allo-AVA uniquely maps these keypoints to precise timestamps, enabling accurate replication of human movements (body and facial gestures) in synchronization with speech. This comprehensive resource enables the development and evaluation of more natural, context-aware avatar animation models, potentially transforming applications ranging from virtual reality to digital assistants.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.16503

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback