Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing
Zhang, Alice, Bertley, Callihan, Liang, Dawei, Thomaz, Edison
–arXiv.org Artificial Intelligence
Social interactions play a crucial role in shaping human behavior, relationships, and societies. It encompasses various forms of communication, such as verbal conversation, non-verbal gestures, facial expressions, and body language. In this work, we develop a novel computational approach to detect a foundational aspect of human social interactions, in-person verbal conversations, by leveraging audio and inertial data captured with a commodity smartwatch in acoustically-challenging scenarios. To evaluate our approach, we conducted a lab study with 11 participants and a semi-naturalistic study with 24 participants. We analyzed machine learning and deep learning models with 3 different fusion methods, showing the advantages of fusing audio and inertial data to consider not only verbal cues but also non-verbal gestures in conversations. Furthermore, we perform a comprehensive set of evaluations across activities and sampling rates to demonstrate the benefits of multimodal sensing in specific contexts. Overall, our framework achieved 82.0$\pm$3.0% macro F1-score when detecting conversations in the lab and 77.2$\pm$1.8% in the semi-naturalistic setting.
arXiv.org Artificial Intelligence
Jul-17-2025
- Country:
- Europe (1.00)
- North America > United States
- Texas (0.28)
- California (0.28)
- New York > New York County
- New York City (0.14)
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Media (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine
- Therapeutic Area (0.93)
- Consumer Health (0.67)
- Technology: