Goto

Collaborating Authors

 microphone




DreamCatcher: A Wearer-aware Sleep Event Dataset Based on Earables in Non-restrictive Environments

Neural Information Processing Systems

Widely available earbuds equipped with sensors (also known as earables) can be combined with a sleep event detection algorithm to offer a convenient alternative to laborious clinical tests for individuals suffering from sleep disorders. Although various solutions utilizing such devices have been proposed to detect sleep events, they ignore the fact that individuals often share sleeping spaces with roommates or couples. To address this issue, we introduce DreamCatcher, the first publicly available dataset for wearer-aware sleep event algorithm development on earables.



Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio

Neural Information Processing Systems

The system consumes, as input, audio signals from headset microphones and body pose, and produces, as output, a 3D sound field surrounding the transmitter's



Mos Food unveils AI system for drive-thru orders

The Japan Times

A Mos Food Services employee places an order via a microphone at an artificial intelligence drive-thru facility, which was unveiled to members of the media in Yoshikawa City, Saitama Prefecture, on Wednesday. The Japanese hamburger chain aims to improve store management efficiency by automating part of customer interaction with conversational AI amid a serious labor shortage. The company plans to introduce the new AI system at multiple outlets in fiscal 2026, which begins in April. In a media demonstration held at a store in the city of Yoshikawa, Saitama Prefecture, a Mos Food employee acting as a customer spoke into a microphone to place a drive-thru order. The AI system took the order after making suggestions such as, We recommend a limited-time avocado burger. Once the system is introduced, store employees will prepare food based on customer orders transmitted from the AI system.


The 5 coolest entertainment innovations of 2025

Popular Science

From a TV that creates color in a totally different way to room-aware surround sound. We may earn revenue from the products available on this page and participate in affiliate programs. The smartphone era has brought about an era of convergence when it comes to consumer electronics. Tons of devices we used to rely on--small cameras, calculators, flashlights, music players, etc.--have rolled up into our phones. Entertainment has experienced a similar move toward a small-screen singularity.


UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures

Neural Information Processing Systems

In reverberant conditions with multiple concurrent speakers, each microphone acquires a mixture signal of multiple speakers at a different location. In over-determined conditions where the microphones out-number speakers, we can narrow down the solutions to speaker images and realize unsupervised speech separation by leveraging each mixture signal as a constraint (i.e., the estimated speaker images at a microphone should add up to the mixture).


TinyML for Speech Recognition

Barovic, Andrew, Moin, Armin

arXiv.org Artificial Intelligence

--We train and deploy a quantized 1D convolutional neural network model to conduct speech recognition on a highly resource-constrained IoT edge device. This can be useful in various Internet of Things (IoT) applications, such as smart homes and ambient assisted living for the elderly and people with disabilities, just to name a few examples. In this paper, we first create a new dataset with over one hour of audio data that enables our research and will be useful to future studies in this field. Second, we utilize the technologies provided by Edge Impulse to enhance our model's performance and achieve a high Accuracy of up to 97% on our dataset. For the validation, we implement our prototype using the Arduino Nano 33 BLE Sense microcontroller board. This microcontroller board is specifically designed for IoT and AI applications, making it an ideal choice for our target use case scenarios. While most existing research focuses on a limited set of keywords, our model can process 23 different keywords, enabling complex commands. Natural Language Processing (NLP) and Speech Recognition are crucial domains in Artificial Intelligence (AI). While NLP deals with enabling computers to analyze, understand, reason on, and generate human language in textual form, speech recognition is concerned with that in spoken form.