AITopics | voice interaction

Collaborating Authors

voice interaction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands

Qi, Kristin, Zhu, Youxiang, Summerour, Caroline, Batsis, John A., Liang, Xiaohui

arXiv.org Artificial IntelligenceSep-4-2025

Early detection of cognitive decline is crucial for enabling interventions that can slow neurodegenerative disease progression. Traditional diagnostic approaches rely on labor-intensive clinical assessments, which are impractical for frequent monitoring. Our pilot study investigates voice assistant systems (VAS) as non-invasive tools for detecting cognitive decline through longitudinal analysis of speech patterns in voice commands. Over an 18-month period, we collected voice commands from 35 older adults, with 15 participants providing daily at-home VAS interactions. To address the challenges of analyzing these short, unstructured and noisy commands, we propose Cog-TiPRO, a framework that combines (1) LLM-driven iterative prompt refinement for linguistic feature extraction, (2) HuBERT-based acoustic feature extraction, and (3) transformer-based temporal modeling. Using iTransformer, our approach achieves 73.80% accuracy and 72.67% F1-score in detecting MCI, outperforming its baseline by 27.13%. Through our LLM approach, we identify linguistic features that uniquely characterize everyday command usage patterns in individuals experiencing cognitive decline.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.17137

Country:

North America > United States > North Carolina (0.28)
North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Dementia (0.49)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Dukawalla: Voice Interfaces for Small Businesses in Africa

Ankrah, Elizabeth, Nyairo, Stephanie, Muchai, Mercy, Awori, Kagonya, Ochieng, Millicent, Kariuki, Mark, O'Neill, Jacki

arXiv.org Artificial IntelligenceMay-9-2025

Small and medium sized businesses often struggle with data driven decision making do to a lack of advanced analytics tools, especially in African countries where they make up a majority of the workforce. Though many tools exist they are not designed to fit into the ways of working of SMB workers who are mobile first, have limited time to learn new workflows, and for whom social and business are tightly coupled. To address this, the Dukawalla prototype was created. This intelligent assistant bridges the gap between raw business data, and actionable insights by leveraging voice interaction and the power of generative AI. Dukawalla provides an intuitive way for business owners to interact with their data, aiding in informed decision making. This paper examines Dukawalla's deployment across SMBs in Nairobi, focusing on their experiences using this voice based assistant to streamline data collection and provide business insights

artificial intelligence, dukawalla, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.0517

Country:

Africa > Kenya > Nairobi City County > Nairobi (0.26)
North America > United States > California > Orange County > Irvine (0.05)
Africa > Kenya > Nairobi Province (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)

Add feedback

Reinforcement Learning for Efficient Toxicity Detection in Competitive Online Video Games

Morrier, Jacob, Kocielnik, Rafal, Alvarez, R. Michael

arXiv.org Artificial IntelligenceMar-26-2025

Online platforms take proactive measures to detect and address undesirable behavior, aiming to focus these resource-intensive efforts where such behavior is most prevalent. This article considers the problem of efficient sampling for toxicity detection in competitive online video games. To make optimal monitoring decisions, video game service operators need estimates of the likelihood of toxic behavior. If no model is available for these predictions, one must be estimated in real time. To close this gap, we propose a contextual bandit algorithm that makes monitoring decisions based on a small set of variables that, according to domain expertise, are associated with toxic behavior. This algorithm balances exploration and exploitation to optimize long-term outcomes and is deliberately designed for easy deployment in production. Using data from the popular first-person action game Call of Duty: Modern Warfare III, we show that our algorithm consistently outperforms baseline algorithms that rely solely on players' past behavior. This finding has substantive implications for the nature of toxicity. It also illustrates how domain expertise can be harnessed to help video game service operators identify and mitigate toxicity, ultimately fostering a safer and more enjoyable gaming experience.

machine learning, reinforcement learning, toxic behavior, (16 more...)

arXiv.org Artificial Intelligence

2503.20968

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Data Science > Data Mining > Big Data (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Chen, Qian, Chen, Yafeng, Chen, Yanni, Chen, Mengzhe, Chen, Yingda, Deng, Chong, Du, Zhihao, Gao, Ruize, Gao, Changfeng, Gao, Zhifu, Li, Yabin, Lv, Xiang, Liu, Jiaqing, Luo, Haoneng, Ma, Bin, Ni, Chongjia, Shi, Xian, Tang, Jialong, Wang, Hui, Wang, Hao, Wang, Wen, Wang, Yuxuan, Xu, Yunlan, Yu, Fan, Yan, Zhijie, Yang, Yexin, Yang, Baosong, Yang, Xian, Yang, Guanrou, Zhao, Tianyu, Zhang, Qinglin, Zhang, Shiliang, Zhao, Nan, Zhang, Pei, Zhang, Chong, Zhou, Jinren

arXiv.org Artificial IntelligenceJan-10-2025

Recent advancements in large language models (LLMs) and multimodal speech-text models have laid the groundwork for seamless voice interactions, enabling real-time, natural, and human-like conversations. Previous models for voice interactions are categorized as native and aligned. Native models integrate speech and text processing in one framework but struggle with issues like differing sequence lengths and insufficient pre-training. Aligned models maintain text LLM capabilities but are often limited by small datasets and a narrow focus on speech tasks. In this work, we introduce MinMo, a Multimodal Large Language Model with approximately 8B parameters for seamless voice interaction. We address the main limitations of prior aligned multimodal models. We train MinMo through multiple stages of speech-to-text alignment, text-to-speech alignment, speech-to-speech alignment, and duplex interaction alignment, on 1.4 million hours of diverse speech data and a broad range of speech tasks. After the multi-stage training, MinMo achieves state-of-the-art performance across various benchmarks for voice comprehension and generation while maintaining the capabilities of text LLMs, and also facilitates full-duplex conversation, that is, simultaneous two-way communication between the user and the system. Moreover, we propose a novel and simple voice decoder that outperforms prior models in voice generation. The enhanced instruction-following capabilities of MinMo supports controlling speech generation based on user instructions, with various nuances including emotions, dialects, and speaking rates, and mimicking specific voices. For MinMo, the speech-to-text latency is approximately 100ms, full-duplex latency is approximately 600ms in theory and 800ms in practice. The MinMo project web page is https://funaudiollm.github.io/minmo, and the code and models will be released soon.

large language model, minmo, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.06282

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Qualitative Approaches to Voice UX

Seaborn, Katie, Urakami, Jacqueline, Pennefather, Peter, Miyake, Norihisa P.

arXiv.org Artificial IntelligenceApr-23-2024

Voice is a natural mode of expression offered by modern computer-based systems. Qualitative perspectives on voice-based user experiences (voice UX) offer rich descriptions of complex interactions that numbers alone cannot fully represent. We conducted a systematic review of the literature on qualitative approaches to voice UX, capturing the nature of this body of work in a systematic map and offering a qualitative synthesis of findings. We highlight the benefits of qualitative methods for voice UX research, identify opportunities for increasing rigour in methods and outcomes, and distill patterns of experience across a diversity of devices and modes of qualitative praxis.

interaction, new york, voice ux, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3658666

2404.14736

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.06)
(28 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.67)
Research Report > Strength High (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
(8 more...)

Add feedback

Viia-hand: a Reach-and-grasp Restoration System Integrating Voice interaction, Computer vision and Auditory feedback for Blind Amputees

Peng, Chunhao, Yang, Dapeng, Cheng, Ming, Dai, Jinghui, Zhao, Deyu, Jiang, Li

arXiv.org Artificial IntelligenceAug-13-2023

Visual feedback plays a crucial role in the process of amputation patients completing grasping in the field of prosthesis control. However, for blind and visually impaired (BVI) amputees, the loss of both visual and grasping abilities makes the "easy" reach-and-grasp task a feasible challenge. In this paper, we propose a novel multi-sensory prosthesis system helping BVI amputees with sensing, navigation and grasp operations. It combines modules of voice interaction, environmental perception, grasp guidance, collaborative control, and auditory/tactile feedback. In particular, the voice interaction module receives user instructions and invokes other functional modules according to the instructions. The environmental perception and grasp guidance module obtains environmental information through computer vision, and feedbacks the information to the user through auditory feedback modules (voice prompts and spatial sound sources) and tactile feedback modules (vibration stimulation). The prosthesis collaborative control module obtains the context information of the grasp guidance process and completes the collaborative control of grasp gestures and wrist angles of prosthesis in conjunction with the user's control intention in order to achieve stable grasp of various objects. This paper details a prototyping design (named viia-hand) and presents its preliminary experimental verification on healthy subjects completing specific reach-and-grasp tasks. Our results showed that, with the help of our new design, the subjects were able to achieve a precise reach and reliable grasp of the target objects in a relatively cluttered environment. Additionally, the system is extremely user-friendly, as users can quickly adapt to it with minimal training.

bvi amputee, experiment, prosthesis, (14 more...)

arXiv.org Artificial Intelligence

2308.06891

Country:

Asia > China > Heilongjiang Province > Harbin (0.05)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Technology (0.48)
Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Rewriting the Script: Adapting Text Instructions for Voice Interaction

Hwang, Alyssa, Oza, Natasha, Callison-Burch, Chris, Head, Andrew

arXiv.org Artificial IntelligenceJun-16-2023

Voice assistants have sharply risen in popularity in recent years, but their use has been limited mostly to simple applications like music, hands-free search, or control of internet-of-things devices. What would it take for voice assistants to guide people through more complex tasks? In our work, we study the limitations of the dominant approach voice assistants take to complex task guidance: reading aloud written instructions. Using recipes as an example, we observe twelve participants cook at home with a state-of-the-art voice assistant. We learn that the current approach leads to nine challenges, including obscuring the bigger picture, overwhelming users with too much information, and failing to communicate affordances. Instructions delivered by a voice assistant are especially difficult because they cannot be skimmed as easily as written instructions. Alexa in particular did not surface crucial details to the user or answer questions well. We draw on our observations to propose eight ways in which voice assistants can ``rewrite the script'' -- summarizing, signposting, splitting, elaborating, volunteering, reordering, redistributing, and visualizing -- to transform written sources into forms that are readily communicated through spoken conversation. We conclude with a vision of how modern advancements in natural language processing can be leveraged for intelligent agents to guide users effectively through complex tasks.

rewriting, text instruction, voice interaction

arXiv.org Artificial Intelligence

2306.09992

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

8 Top Chatbot Trends and Predictions to Know in 2023

#artificialintelligenceJan-25-2023, 22:57:24 GMT

Chatbots are an integral part of corporate communications. The market is growing, and chatbot trends are useful in various activities including banking, shopping, and travel booking. Earlier phone calls and face-to-face meetings dominated the landscape of communications. Later, mobile apps, online forms, email, and social media became the communication means. Industries use chatbots as web design trends to navigate their websites.

artificial intelligence, customer, natural language, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Reinforcement Learning based Voice Interaction to Clear Path for Robots in Elevator Environment

Ma, Wanli, Gao, Xinyi, Shi, Jianwei, Hu, Hao, Wang, Chaoyang, Liang, Yanxue, Karakus, Oktay

arXiv.org Artificial IntelligenceDec-8-2022

Efficient use of the space in an elevator is very necessary for a service robot, due to the need for reducing the amount of time caused by waiting for the next elevator. To provide a solution for this, we propose a hybrid approach that combines reinforcement learning (RL) with voice interaction for robot navigation in the scene of entering the elevator. RL provides robots with a high exploration ability to find a new clear path to enter the elevator compared to traditional navigation methods such as Optimal Reciprocal Collision Avoidance (ORCA). The proposed method allows the robot to take an active clear path action towards the elevator whilst a crowd of people stands at the entrance of the elevator wherein there are still lots of space. This is done by embedding a clear path action (voice prompt) into the RL framework, and the proposed navigation policy helps the robot to finish tasks efficiently and safely. Our model approach provides a great improvement in the success rate and reward of entering the elevator compared to state-of-the-art navigation policies without active clear path operation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2203.09844

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine (1.00)
Transportation (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Council Post: From Emotion To Empathy: Bringing Human Experience To Voice AI

#artificialintelligenceSep-23-2022, 06:02:13 GMT

Raghu Ravinutala is CEO and co-founder of enterprise-grade conversational AI platform Yellow.ai. The last few years have seen increased adoption of voice technology, with the usage of voice assistants booming across the globe. A lot of it has to do with advancements in speech recognition technology, easy accessibility to voice interfaces and availability at the right time and the right place. Not only that, but Covid-19 has acted as a catalyst for businesses. Popularly referred to as the "fourth channel of sales," voice technology is impacting how consumers interact with brands, preferring the immediacy and interpersonality of phone calls.

ai agent, customer, voice ai agent, (11 more...)

#artificialintelligence

Industry: Health & Medicine (0.77)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.75)

Add feedback