Collaborating Authors

Technical Perspective: Computation Where the (inter)Action Is

Communications of the ACM

SoundWatch is a prototype system that detects audio events and displays descriptions of them to deaf and hard-of-hearing people via the screen of their smartwatch. Beyond the system itself, SoundWatch contributes a case study of the opportunities and challenges we might expect as computation continues to move closer to where the interaction happens. Access technology has long been a window into the future, and so we can learn a lot from prototypes like SoundWatch. As one example, speech recognition is now mainstream, but the people who have relied on it the longest are those who found it difficult to type otherwise. Mainstream user interfaces focus on a small set of modalities, whereas accessibility necessarily explores interactions beyond common ability assumptions.

Mobile Sound Recognition for the Deaf and Hard of Hearing Artificial Intelligence

Human perception of surrounding events is strongly dependent on audio cues. Thus, acoustic insulation can seriously impact situational awareness. We present an exploratory study in the domain of assistive computing, eliciting requirements and presenting solutions to problems found in the development of an environmental sound recognition system, which aims to assist deaf and hard of hearing people in the perception of sounds. To take advantage of smartphones computational ubiquity, we propose a system that executes all processing on the device itself, from audio features extraction to recognition and visual presentation of results. Our application also presents the confidence level of the classification to the user. A test of the system conducted with deaf users provided important and inspiring feedback from participants.

A Machine Learning Approach for Delineating Similar Sound Symptoms of Respiratory Conditions on a Smartphone Artificial Intelligence

Clinical characterization and interpretation of respiratory sound symptoms have remained a challenge due to the similarities in the audio properties that manifest during auscultation in medical diagnosis. The misinterpretation and conflation of these sounds coupled with the comorbidity cases of the associated ailments - particularly, exercised-induced respiratory conditions; result in the under-diagnosis and undertreatment of the conditions. Though several studies have proposed computerized systems for objective classification and evaluation of these sounds, most of the algorithms run on desktop and backend systems. In this study, we leverage the improved computational and storage capabilities of modern smartphones to distinguish the respiratory sound symptoms using machine learning algorithms namely: Random Forest (RF), Support Vector Machine (SVM), and k-Nearest Neighbour (k-NN). The appreciable performance of these classifiers on a mobile phone shows smartphone as an alternate tool for recognition and discrimination of respiratory symptoms in real-time scenarios. Further, the objective clinical data provided by the machine learning process could aid physicians in the screening and treatment of a patient during ambulatory care where specialized medical devices may not be readily available.

Hearing your touch: A new acoustic side channel on smartphones Artificial Intelligence

We present the first acoustic side-channel attack that recovers what users type on the virtual keyboard of their touch-screen smartphone or tablet. When a user taps the screen with a finger, the tap generates a sound wave that propagates on the screen surface and in the air. We found the device's microphone(s) can recover this wave and "hear" the finger's touch, and the wave's distortions are characteristic of the tap's location on the screen. Hence, by recording audio through the built-in microphone(s), a malicious app can infer text as the user enters it on their device. We evaluate the effectiveness of the attack with 45 participants in a real-world environment on an Android tablet and an Android smartphone. For the tablet, we recover 61% of 200 4-digit PIN-codes within 20 attempts, even if the model is not trained with the victim's data. For the smartphone, we recover 9 words of size 7--13 letters with 50 attempts in a common side-channel attack benchmark. Our results suggest that it not always sufficient to rely on isolation mechanisms such as TrustZone to protect user input. We propose and discuss hardware, operating-system and application-level mechanisms to block this attack more effectively. Mobile devices may need a richer capability model, a more user-friendly notification system for sensor usage and a more thorough evaluation of the information leaked by the underlying hardware.

An Optimized Recurrent Unit for Ultra-Low-Power Keyword Spotting Machine Learning

There is growing interest in being able to run neural networks on sensors, wearables and internet-of-things (IoT) devices. However, the computational demands of neural networks make them difficult to deploy on resource-constrained edge devices. To meet this need, our work introduces a new recurrent unit architecture that is specifically adapted for on-device low power acoustic event detection (AED). The proposed architecture is based on the gated recurrent unit (`GRU') but features optimizations that make it implementable on ultra-low power micro-controllers such as the Arm Cortex M0+. Our new architecture, the Embedded Gated Recurrent Unit (eGRU) is demonstrated to be highly efficient and suitable for short-duration AED and keyword spotting tasks. A single eGRU cell is 60x faster and 10x smaller than a GRU cell. Despite its optimizations, eGRU compares well with GRU across tasks of varying complexities. The practicality of eGRU is investigated in a wearable acoustic event detection application. An eGRU model is implemented and tested on the Arm Cortex M0-based Atmel ATSAMD21E18 processor. The Arm M0+ implementation of the eGRU model compares favorably with a full precision GRU that is running on a workstation. The embedded eGRU model achieves a classification accuracy 95.3%, which is only 2% less than the full precision GRU.