AITopics | voice assistant

Collaborating Authors

voice assistant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Amazon Alexa Is Now Available to Everyone. Here's How to Turn It Off (2026)

WIREDJan-27-2026, 19:12:06 GMT

Alexa+ has been rolling out to everyone with a Prime membership, even if you didn't ask for it. Here's how to change it back. If Alexa's in your home, you might've been one of many users this month who were suddenly moved from the original Alexa to the new AI-powered Alexa+ voice assistant . Amazon announced in early January during CES that it'd be rolling out the new assistant to all Alexa+ Early Access customers, and that turns out to also include all Prime members, even if you weren't on the Early Access list. Alexa+ is still in Early Access, as it has been since it launched in spring last year, meaning that the assistant isn't fully complete, nor is it requiring you to pay the $20 monthly fee if you don't have Prime.

alexa, artificial intelligence, natural language, (21 more...)

WIRED

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Slovakia (0.04)
Europe > Czechia (0.04)

Industry: Information Technology (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.35)

Add feedback

Scarlett Johansson and Cate Blanchett back campaign accusing AI firms of theft

The GuardianJan-22-2026, 14:04:03 GMT

Johansson was dragged into the AI debate after OpenAI's voice assistant used her vocal likeness, prompting the actor to say she was'angered' by the move. Johansson was dragged into the AI debate after OpenAI's voice assistant used her vocal likeness, prompting the actor to say she was'angered' by the move. Scarlett Johansson, Cate Blanchett, REM and Jodi Picoult are among hundreds of Hollywood stars, musicians and authors backing a new campaign accusing AI companies of "theft" of their work. The "Stealing Isn't Innovation" drive launched on Thursday with the support of approximately 800 creative professionals and bands. It adds: "Artists, writers, and creators of all kinds are banding together with a simple message: Stealing our work is not innovation.

large language model, machine learning, natural language, (16 more...)

The Guardian

Country:

North America > United States (0.33)
Europe > United Kingdom (0.17)
Europe > Ukraine (0.07)
Oceania > Australia (0.05)

Industry:

Leisure & Entertainment > Sports (0.73)
Media > Film (0.56)
Government > Regional Government (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.79)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback

I Ditched Alexa and Upgraded My Smart Home

WIREDNov-16-2025, 11:00:00 GMT

Here's how I cut down my family's reliance on Alexa. Until recently, my smart home setup was in chaos. After years of testing, buying, and upgrading to the latest smart home gadgets in an attempt to make my life easier, it became a bloated mess that was actually making it more complicated. My Alexa, Google Home, and Apple Home apps were awash with dead devices, duplicates, and automations that simply didn't work. My Hue Bridge, trying desperately to tie it all together, was creaking at the seams.

artificial intelligence, chatbot, natural language, (16 more...)

WIRED

Country:

Asia > Nepal (0.14)
North America > United States > California (0.04)
Europe > Slovakia (0.04)
Europe > Czechia (0.04)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.73)

Add feedback

Gen AI in Automotive: Applications, Challenges, and Opportunities with a Case study on In-Vehicle Experience

Shinde, Chaitanya, Garikapati, Divya

arXiv.org Artificial IntelligenceNov-4-2025

Generative Artificial Intelligence is emerging as a transformative force in the automotive industry, enabling novel applications across vehicle design, manufacturing, autonomous driving, predictive maintenance, and in vehicle user experience. This paper provides a comprehensive review of the current state of GenAI in automotive, highlighting enabling technologies such as Generative Adversarial Networks and Variational Autoencoders. Key opportunities include accelerating autonomous driving validation through synthetic data generation, optimizing component design, and enhancing human machine interaction via personalized and adaptive interfaces. At the same time, the paper identifies significant technical, ethical, and safety challenges, including computational demands, bias, intellectual property concerns, and adversarial robustness, that must be addressed for responsible deployment. A case study on Mercedes Benzs MBUX Virtual Assistant illustrates how GenAI powered voice systems deliver more natural, proactive, and personalized in car interactions compared to legacy rule based assistants. Through this review and case study, the paper outlines both the promise and limitations of GenAI integration in the automotive sector and presents directions for future research and development aimed at achieving safer, more efficient, and user centric mobility. Unlike prior reviews that focus solely on perception or manufacturing, this paper emphasizes generative AI in voice based HMI, bridging safety and user experience perspectives.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.00026

Country: North America > United States (0.28)

Genre: Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)
Automobiles & Trucks > Manufacturer (1.00)
Government > Military (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions

Selvakumar, Ramaneswaran, Seth, Ashish, Anand, Nishit, Tyagi, Utkarsh, Kumar, Sonal, Ghosh, Sreyan, Manocha, Dinesh

arXiv.org Artificial IntelligenceSep-29-2025

The rapid progress of Large Language Models (LLMs) has empowered omni models to act as voice assistants capable of understanding spoken dialogues. These models can process multimodal inputs beyond text, such as speech and visual data, enabling more context-aware interactions. However, current benchmarks fall short in comprehensively evaluating how well these models generate context-aware responses, particularly when it comes to implicitly understanding fine-grained speech characteristics, such as pitch, emotion, timbre, and volume or the environmental acoustic context such as background sounds. Additionally, they inadequately assess the ability of models to align paralinguistic cues with complementary visual signals to inform their responses. To address these gaps, we introduce MultiVox, the first omni voice assistant benchmark designed to evaluate the ability of voice assistants to integrate spoken and visual cues including paralinguistic speech features for truly multimodal understanding. Specifically, MultiVox includes 1000 human-annotated and recorded speech dialogues that encompass diverse paralinguistic features and a range of visual cues such as images and videos. Our evaluation on 10 state-of-the-art models reveals that, although humans excel at these tasks, current models consistently struggle to produce contextually grounded responses.

benchmark, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.10859

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Wang, Ke, Ren, Houxing, Lu, Zimu, Zhan, Mingjie, Li, Hongsheng

arXiv.org Artificial IntelligenceSep-29-2025

The growing capabilities of large language models and multimodal systems have spurred interest in voice-first AI assistants, yet existing benchmarks are inadequate for evaluating the full range of these systems' capabilities. We introduce VoiceAssistant-Eval, a comprehensive benchmark designed to assess AI assistants across listening, speaking, and viewing. VoiceAssistant-Eval comprises 10,497 curated examples spanning 13 task categories. These tasks include natural sounds, music, and spoken dialogue for listening; multi-turn dialogue, role-play imitation, and various scenarios for speaking; and highly heterogeneous images for viewing. To demonstrate its utility, we evaluate 21 open-source models and GPT-4o-Audio, measuring the quality of the response content and speech, as well as their consistency. The results reveal three key findings: (1) proprietary models do not universally outperform open-source models; (2) most models excel at speaking tasks but lag in audio understanding; and (3) well-designed smaller models can rival much larger ones. Notably, the mid-sized Step-Audio-2-mini (7B) achieves more than double the listening accuracy of LLaMA-Omni2-32B-Bilingual. However, challenges remain: multimodal (audio plus visual) input and role-play voice imitation tasks are difficult for current models, and significant gaps persist in robustness and safety alignment. VoiceAssistant-Eval identifies these gaps and establishes a rigorous framework for evaluating and guiding the development of next-generation AI assistants. Code and data will be released at https://mathllm.github.io/VoiceAssistantEval/ .

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.22651

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Personal > Interview (0.93)
Research Report > Experimental Study (0.93)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Speechless: Speech Instruction Training Without Speech for Low Resource Languages

Dao, Alan, Vu, Dinh Bach, Ha, Huy Hoang, Anh, Tuan Le Duc, Gopal, Shreyas, Yeo, Yue Heng, Low, Warren Keng Hoong, Chng, Eng Siong, Yip, Jia Qi

arXiv.org Artificial IntelligenceAug-26-2025

The rapid growth of voice assistants powered by large language models (LLM) has highlighted a need for speech instruction data to train these systems. Despite the abundance of speech recognition data, there is a notable scarcity of speech instruction data, which is essential for fine-tuning models to understand and execute spoken commands. Generating high-quality synthetic speech requires a good text-to-speech (TTS) model, which may not be available to low resource languages. Our novel approach addresses this challenge by halting synthesis at the semantic representation level, bypassing the need for TTS. We achieve this by aligning synthetic semantic representations with the pre-trained Whisper encoder, enabling an LLM to be fine-tuned on text instructions while maintaining the ability to understand spoken instructions during inference. This simplified training process is a promising approach to building voice assistant for low-resource languages.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-1292

2505.17417

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TextOnly: A Unified Function Portal for Text-Related Functions on Smartphones

Tu, Minghao, Yu, Chun, Shen, Xiyuan, Zheng, Zhi, Chen, Li, Shi, Yuanchun

arXiv.org Artificial IntelligenceAug-26-2025

Text boxes serve as portals to diverse functionalities in today's smartphone applications. However, when it comes to specific functionalities, users always need to navigate through multiple steps to access particular text boxes for input. We propose TextOnly, a unified function portal that enables users to access text-related functions from various applications by simply inputting text into a sole text box. For instance, entering a restaurant name could trigger a Google Maps search, while a greeting could initiate a conversation in WhatsApp. Despite their brevity, TextOnly maximizes the utilization of these raw text inputs, which contain rich information, to interpret user intentions effectively. TextOnly integrates large language models(LLM) and a BERT model. The LLM consistently provides general knowledge, while the BERT model can continuously learn user-specific preferences and enable quicker predictions. Real-world user studies demonstrated TextOnly's effectiveness with a top-1 accuracy of 71.35%, and its ability to continuously improve both its accuracy and inference speed. Participants perceived TextOnly as having satisfactory usability and expressed a preference for TextOnly over manual executions. Compared with voice assistants, TextOnly supports a greater range of text-related functions and allows for more concise inputs.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.16926

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > China > Beijing > Beijing (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Major Philips Hue leak reveals 'Pro' hub with a killer feature

PCWorldAug-14-2025, 14:52:45 GMT

Philips Hue appears to be teeing up a new, more powerful hub that can turn Hue bulbs into motion sensors, according to leaked details and images that briefly appeared on Philips Hue's own website. The unannounced products, which have since been yanked from the "New on Hue" page, included the "faster" Hue Bridge Pro as well as a wired video doorbell, a refreshed and more efficient A19 bulb, permanent and globe-style versions of Hue's Festavia outdoor string lights, a gradient light strip, and the ability to control your Hue lights with the Sonos voice assistant. No pricing details were included in the leaked details, which were live on the Hue website for several hours Wednesday. The leaked products were initially spotted by users on Reddit. Reached by TechHive, a Phillips Hue spokesperson declined to comment.

artificial intelligence, bridge, speech recognition, (12 more...)

PCWorld

Industry: Information Technology > Smart Houses & Appliances (0.98)

Technology:

Information Technology > Communications > Networks (0.39)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.37)

Add feedback

Garmin Forerunner 570 review: running watch stumbles just short of greatness

The GuardianAug-7-2025, 06:00:26 GMT

Garmin's latest mid-range running and multisport watch has smartened up with a very bright OLED screen, voice assistant and upgraded sensors. The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. The Forerunner 570 continues the revamp of the company's running watches, which have all gained more accurate GPS chips and improved heart rate monitors. The new model replaces the popular 265 and sits under the 970. It offers a similar look and feel to the top watch but with a few key features removed for a lower price.

artificial intelligence, forerunner 570, speech recognition, (16 more...)

The Guardian

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Hardware (0.54)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.36)

Add feedback