Goto

Collaborating Authors

 aac



AAC with Automated Vocabulary from Photographs: Insights from School and Speech-Language Therapy Settings

Communications of the ACM

Traditional symbol-based AAC devices impose meta-linguistic and memory demands on individuals with complex communication needs and hinder conversation partners from stimulating symbolic language in meaningful moments. This work presents a prototype application that generates situation-specific communication boards formed by a combination of descriptive, narrative, and semantic related words and phrases inferred automatically from photographs. Through semi-structured interviews with AAC professionals, we investigate how this prototype was used to support communication and language learning in naturalistic school and therapy settings. We find that the immediacy of vocabulary reduces conversation partners' workload, opens up opportunities for AAC stimulation, and facilitates symbolic understanding and sentence construction. We contribute a nuanced understanding of how vocabularies generated automatically from photographs can support individuals with complex communication needs in using and learning symbolic AAC, offering insights into the design of automatic vocabulary generation methods and interfaces to better support various scenarios of use and goals.


Technical Perspective: Can AI Keep Accessible Communication in the Picture?

Communications of the ACM

Communication is a vital part of being--a means to affect change in the world, speak with loved ones, or simply get our needs and wants met. However, this can be challenging for people with communication impairments--a growing proportion of the population who might experience difficulties because of a stroke, autism, or sensory challenges. A growing field intends to use AI to augment language to support these users, however, a key tension is whether this can be done without reducing autonomy. Here, I discuss a piece of recent research that explores how AI can use photos to support communication. People with communication impairments are underserved due to the nature of their disability.


Beyond Pairwise Correlations: Higher-Order Redundancies in Self-Supervised Representation Learning

Zollikofer, David, Egressy, Béni, Benzing, Frederik, Otth, Matthias, Wattenhofer, Roger

arXiv.org Artificial Intelligence

Several self-supervised learning (SSL) approaches have shown that redundancy reduction in the feature embedding space is an effective tool for representation learning. However, these methods consider a narrow notion of redundancy, focusing on pairwise correlations between features. To address this limitation, we formalize the notion of embedding space redundancy and introduce redundancy measures that capture more complex, higher-order dependencies. We mathematically analyze the relationships between these metrics, and empirically measure these redundancies in the embedding spaces of common SSL methods. Based on our findings, we propose Self Supervised Learning with Predictability Minimization (SSLPM) as a method for reducing redundancy in the embedding space. SSLPM combines an encoder network with a predictor engaging in a competitive game of reducing and exploiting dependencies respectively. We demonstrate that SSLPM is competitive with state-of-the-art methods and find that the best performing SSL methods exhibit low embedding space redundancy, suggesting that even methods without explicit redundancy reduction mechanisms perform redundancy reduction implicitly.


Bridging the Social & Technical Divide in Augmentative and Alternative Communication (AAC) Applications for Autistic Adults

Martin, Lara J., Nagalakshmi, Malathy

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) techniques are being used more frequently to improve high-tech Augmentative and Alternative Communication (AAC), but many of these techniques are integrated without the inclusion of the users' perspectives. As many of these tools are created with children in mind, autistic adults are often neglected in the design of AAC tools to begin with. We conducted in-depth interviews with 12 autistic adults to find the pain points of current AAC and determine what general technological advances they would find helpful. We found that in addition to technological issues, there are many societal issues as well. We found 9 different categories of themes from our interviews: input options, output options, selecting or adapting AAC for a good fit, when to start or swap AAC, benefits (of use), access (to AAC), stumbling blocks for continued use, social concerns, and lack of control. In this paper, we go through these nine categories in depth and then suggest possible guidelines for the NLP community, AAC application makers, and policy makers to improve AAC use for autistic adults.


Few-Shot Detection of Machine-Generated Text using Style Representations

Soto, Rafael Rivera, Koch, Kailin, Khan, Aleem, Chen, Barry, Bishop, Marcus, Andrews, Nicholas

arXiv.org Artificial Intelligence

The advent of instruction-tuned language models that convincingly mimic human writing poses a significant risk of abuse. For example, such models could be used for plagiarism, disinformation, spam, or phishing. However, such abuse may be counteracted with the ability to detect whether a piece of text was composed by a language model rather than a human. Some previous approaches to this problem have relied on supervised methods trained on corpora of confirmed human and machine-written documents. Unfortunately, model under-specification poses an unavoidable challenge for neural network-based detectors, making them brittle in the face of data shifts, such as the release of further language models producing still more fluent text than the models used to train the detectors. Other previous approaches require access to the models that may have generated a document in question at inference or detection time, which is often impractical. In light of these challenges, we pursue a fundamentally different approach not relying on samples from language models of concern at training time. Instead, we propose to leverage representations of writing style estimated from human-authored text. Indeed, we find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors, including state of the art large language models like Llama 2, ChatGPT, and GPT-4. Furthermore, given a handful of examples composed by each of several specific language models of interest, our approach affords the ability to predict which model generated a given document.


PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

Park, Chanjun, Jang, Yoonna, Lee, Seolhwa, Seo, Jaehyung, Yang, Kisu, Lim, Heuiseok

arXiv.org Artificial Intelligence

Augmentative and alternative communication (AAC) is a practical means of communication for people with language disabilities. In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities. PicTalky can process both text and pictograms more accurately by connecting a series of neural-based NLP modules. Moreover, we perform quantitative and qualitative analyses on the essential features of PicTalky. It is expected that those suffering from language problems will be able to express their intentions or desires more easily and improve their quality of life by using this service. We have made the models freely available alongside a demonstration of the Web interface. Furthermore, we implemented robotics AAC for the first time by applying PicTalky to the NAO robot.


A Transformer-based Audio Captioning Model with Keyword Estimation

Koizumi, Yuma, Masumura, Ryo, Nishida, Kyosuke, Yasuda, Masahiro, Saito, Shoichiro

arXiv.org Machine Learning

One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation called TRACKE. It simultaneously solves the word-selection indeterminacy problem with the main task of AAC while executing the sub-task of acoustic event detection/acoustic scene classification (i.e., keyword estimation). TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy. Experimental results on a public AAC dataset indicate that TRACKE achieved state-of-the-art performance and successfully estimated both the caption and its keywords.


Engineer Spotlight: NXROBO's Dr. Tin Lun Lam on the Complexities of Designing Home Robots

#artificialintelligence

Home robots have been lurking on the horizon for years. BIG-i is NXROBO's most recent contribution to this industry, a home robot which can interact with family members and smart appliances. The robot uses a custom operating system designed by NXROBO to analyze voice commands and visual cues. The robot, which currently only speaks English and Mandarin, is programmed through if-this-then-that style voice commands. For example, parents can define different conditional situations like "If you see Tommy grabbing food, remind him to wash his hands".


Autonomous Agents Coordination: Action Languages meet CLP(FD) and Linda

Dovier, Agostino, Formisano, Andrea, Pontelli, Enrico

arXiv.org Artificial Intelligence

Representing and reasoning in multi-agent domains are two of the most active research areas in multi-agent system (MAS) research. The literature in this area is extensive, and it provides a plethora of logics for representing and reasoning about various aspects of MAS domains, e.g., [20, 14, 24, 22, 12]. A large number of the logics proposed in the literature have been designed to specifically focus on particular aspects of the problem of modeling MAS, often justified by a specific application scenario. This makes them suitable to address specific subsets of the general features required to model real-world MAS domains. The task of generalizing some of these existing proposals to create a uniform and comprehensive framework for modeling several different aspects of MAS domains is an open problem. Although we do not dispute the possibility of extending several of these existing proposals in various directions, the task does not seem easy. Similarly, a variety of multi-agent programming platforms have been proposed, mostly in the style of multi-agent programming languages, like Jason [3], ConGolog [9], 3APL [7], GOAL [8], but with limited planning capabilities. Our effort in this paper is focused on the development of a novel action language for multi-agent systems.