Goto

Collaborating Authors

 Gesture Recognition


You can now control Chromebooks using head tilts and face gestures

PCWorld

If you have a Chromebook, here's an interesting development for you: Google is now making it possible to control ChromeOS using just your face, The Verge reports. Imagine being able to use head movements to move your mouse cursor around, then using facial expressions to click -- or perform other frequent tasks, like toggling dictation so you can speak to write. The feature, which was originally announced in December as Face Control, is aimed specifically at people with motor impairments who'd otherwise have trouble controlling a mouse cursor on screen. But it's also meant for students and educators, who have so far benefited from the many other Chromebook accessibility features already available. Face Control is currently rolling out to compatible Chromebooks.


ADL faces backlash for defending Elon Musk's raised-arm gesture

Al Jazeera

Washington, DC โ€“ After Elon Musk made an apparent Nazi salute at an inauguration rally for United States President Donald Trump, the Anti-Defamation League (ADL) rushed to defend the SpaceX founder. The self-described anti-Semitism watchdog and "leading anti-hate organization in the world" dismissed Musk's raised arm as "an awkward gesture in a moment of enthusiasm" in a social media post on Monday. Months earlier, however, Jonathan Greenblatt, the head of the staunchly pro-Israel ADL, compared the Palestinian keffiyeh to the Nazi swastika. Activists say the contrast between the ADL's hurried defence of Musk and its efforts to demonise Palestinians and their supporters shows that the group is more focused on silencing voices critical of Israel than it is on fighting anti-Semitism. "The ADL is being crystal clear about where it stands," said Beth Miller, political director at Jewish Voice for Peace (JVP).


Hands on with the ultralight Asus Zenbook A14 at CES 2025: MacBook Airs should be scared

Mashable

Reading an article about the Asus Zenbook A14 is doing a disservice to all parties involved. You really need to hold this thing yourself. The new 14-inch ultraportable laptop made a splash at CES 2025 in Las Vegas this week as the "world's lightest Copilot PC," earning an Innovation Award in the tech show's Sustainability & Energy/Power category and multiple "best of" nods (including one from us at the CNET Group). Made from an innovative material called "Ceraluminum" that's elegant and eco-friendly, the Zenbook A14 combines a flyweight frame with next-level Qualcomm power efficiency, a bright OLED display, and a gesture-controlled trackpad -- all for as low as 899.99. I'm old enough to remember when Steve Jobs slid the first Apple MacBook Air out of a brown envelope.


Neural Lab's AirTouch brings gesture control to Windows and Android devices with just a webcam

Engadget

Some of the best tech we see at CES feels pulled straight from sci-fi. Yesterday at CES 2025, I tested out Neural Lab's AirTouch technology, which lets you interact with a display using hand gestures alone, exactly what movies like Minority Report and Iron Man promised. Of course, plenty of companies have delivered on varying forms of gesture control. Microsoft's Kinect is an early example while the Apple Watch's double tap feature and Vision Pro's pinch gestures are just two of many current iterations. But I was impressed with how well AirTouch delivered and, unlike most gesture technology out there, it requires no special equipment -- just a standard webcam -- and works with a wide range of devices.


The 10 Coolest Things We've Seen So Far at CES 2025

WIRED

If you download Doublepoint's new WowMouse app, you can use finger gestures to control the cursor on your MacBook. Sit on the bed and hit Next Episode on your Mac screen without even touching the laptop--just use the Apple Watch on your wrist and hand gestures like a Jedi. Last year, the company started with a Wear OS app for the Pixel Watch and Samsung Galaxy smartwatches, but now it's the Apple Watch's turn. There are some caveats--the Apple Watch can't control the screen of an iPhone whereas this is possible with a Wear OS watch and an Android phone, but it's supposedly on the roadmap. It's powered by the Inertial Measurement Units (IMU) sensor on most smartwatches and you can make a few gestures, like a double tap. Right now the main utility is to control the Mac screen in front of you, but the goal is to let it work with a variety of Bluetooth-enabled devices.


ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition

arXiv.org Artificial Intelligence

Transformer models have demonstrated remarkable success in many domains such as natural language processing (NLP) and computer vision. With the growing interest in transformer-based architectures, they are now utilized for gesture recognition. So, we also explore and devise a novel ConvMixFormer architecture for dynamic hand gestures. The transformers use quadratic scaling of the attention features with the sequential data, due to which these models are computationally complex and heavy. We have considered this drawback of the transformer and designed a resource-efficient model that replaces the self-attention in the transformer with the simple convolutional layer-based token mixer. The computational cost and the parameters used for the convolution-based mixer are comparatively less than the quadratic self-attention. Convolution-mixer helps the model capture the local spatial features that self-attention struggles to capture due to their sequential processing nature. Further, an efficient gate mechanism is employed instead of a conventional feed-forward network in the transformer to help the model control the flow of features within different stages of the proposed model. This design uses fewer learnable parameters which is nearly half the vanilla transformer that helps in fast and efficient training. The proposed method is evaluated on NVidia Dynamic Hand Gesture and Briareo datasets and our model has achieved state-of-the-art results on single and multimodal inputs. We have also shown the parameter efficiency of the proposed ConvMixFormer model compared to other methods. The source code is available at https://github.com/mallikagarg/ConvMixFormer.


Robust Dynamic Gesture Recognition at Ultra-Long Distances

arXiv.org Artificial Intelligence

Dynamic hand gestures play a crucial role in conveying nonverbal information for Human-Robot Interaction (HRI), eliminating the need for complex interfaces. Current models for dynamic gesture recognition suffer from limitations in effective recognition range, restricting their application to close proximity scenarios. In this letter, we present a novel approach to recognizing dynamic gestures in an ultra-range distance of up to 28 meters, enabling natural, directive communication for guiding robots in both indoor and outdoor environments. Our proposed SlowFast-Transformer (SFT) model effectively integrates the SlowFast architecture with Transformer layers to efficiently process and classify gesture sequences captured at ultra-range distances, overcoming challenges of low resolution and environmental noise. We further introduce a distance-weighted loss function shown to enhance learning and improve model robustness at varying distances. Our model demonstrates significant performance improvement over state-of-the-art gesture recognition frameworks, achieving a recognition accuracy of 95.1% on a diverse dataset with challenging ultra-range gestures. This enables robots to react appropriately to human commands from a far distance, providing an essential enhancement in HRI, especially in scenarios requiring seamless and natural interaction.


Machine Learning-based sEMG Signal Classification for Hand Gesture Recognition

arXiv.org Artificial Intelligence

EMG-based hand gesture recognition uses electromyographic~(EMG) signals to interpret and classify hand movements by analyzing electrical activity generated by muscle contractions. It has wide applications in prosthesis control, rehabilitation training, and human-computer interaction. Using electrodes placed on the skin, the EMG sensor captures muscle signals, which are processed and filtered to reduce noise. Numerous feature extraction and machine learning algorithms have been proposed to extract and classify muscle signals to distinguish between various hand gestures. This paper aims to benchmark the performance of EMG-based hand gesture recognition using novel feature extraction methods, namely, fused time-domain descriptors, temporal-spatial descriptors, and wavelet transform-based features, combined with the state-of-the-art machine and deep learning models. Experimental investigations on the Grabmyo dataset demonstrate that the 1D Dilated CNN performed the best with an accuracy of $97\%$ using fused time-domain descriptors such as power spectral moments, sparsity, irregularity factor and waveform length ratio. Similarly, on the FORS-EMG dataset, random forest performed the best with an accuracy of $94.95\%$ using temporal-spatial descriptors (which include time domain features along with additional features such as coefficient of variation (COV), and Teager-Kaiser energy operator (TKEO)).


Object Recognition in Human Computer Interaction:- A Comparative Analysis

arXiv.org Artificial Intelligence

Human-computer interaction (HCI) has been a widely researched area for many years, with continuous advancements in technology leading to the development of new techniques that change the way we interact with computers. With the recent advent of powerful computers, we recognize human actions and interact accordingly, thus revolutionizing the way we interact with computers. The purpose of this paper is to provide a comparative analysis of various algorithms used for recognizing user faces and gestures in the context of computer vision and HCI. This study aims to explore and evaluate the performance of different algorithms in terms of accuracy, robustness, and efficiency. This study aims to provide a comprehensive analysis of algorithms for face and gesture recognition in the context of computer vision and HCI, with the goal of improving the design and development of interactive systems that are more intuitive, efficient, and user-friendly.


Towards a GENEA Leaderboard -- an Extended, Living Benchmark for Evaluating and Advancing Conversational Motion Synthesis

arXiv.org Artificial Intelligence

Current evaluation practices in speech-driven gesture generation lack standardisation and focus on aspects that are easy to measure over aspects that actually matter. This leads to a situation where it is impossible to know what is the state of the art, or to know which method works better for which purpose when comparing two publications. In this position paper, we review and give details on issues with existing gesture-generation evaluation, and present a novel proposal for remedying them. Specifically, we announce an upcoming living leaderboard to benchmark progress in conversational motion synthesis. Unlike earlier gesture-generation challenges, the leaderboard will be updated with large-scale user studies of new gesture-generation systems multiple times per year, and systems on the leaderboard can be submitted to any publication venue that their authors prefer. By evolving the leaderboard evaluation data and tasks over time, the effort can keep driving progress towards the most important end goals identified by the community. We actively seek community involvement across the entire evaluation pipeline: from data and tasks for the evaluation, via tooling, to the systems evaluated. In other words, our proposal will not only make it easier for researchers to perform good evaluations, but their collective input and contributions will also help drive the future of gesture-generation research.