AITopics | dictation

Collaborating Authors

dictation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding Cross Task Generalization in Handwriting-Based Alzheimer's Screening via Vision Language Adaptation

Gong, Changqing, Qin, Huafeng, El-Yacoubi, Mounim A.

arXiv.org Artificial IntelligenceNov-11-2025

Alzheimer's disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective window into subtle motor and cognitive decline. Existing handwriting-based AD studies, mostly relying on online trajectories and hand-crafted features, have not systematically examined how task type influences diagnostic performance and cross-task generalization. Meanwhile, large-scale vision language models have demonstrated remarkable zero or few-shot anomaly detection in natural images and strong adaptability across medical modalities such as chest X-ray and brain MRI. However, handwriting-based disease detection remains largely unexplored within this paradigm. To close this gap, we introduce a lightweight Cross-Layer Fusion Adapter framework that repurposes CLIP for handwriting-based AD screening. CLFA implants multi-level fusion adapters within the visual encoder to progressively align representations toward handwriting-specific medical cues, enabling prompt-free and efficient zero-shot inference. Using this framework, we systematically investigate cross-task generalization-training on a specific handwriting task and evaluating on unseen ones-to reveal which task types and writing patterns most effectively discriminate AD. Extensive analyses further highlight characteristic stroke patterns and task-level factors that contribute to early AD identification, offering both diagnostic insights and a benchmark for handwriting-based cognitive assessment.

data mining, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.05841

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(2 more...)

Add feedback

Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

Corbeil, Jean-Philippe, Abacha, Asma Ben, Michalopoulos, George, Swazinna, Phillip, Del-Agua, Miguel, Tremblay, Jerome, Daniel, Akila Jeeson, Bader, Cari, Cho, Yu-Cheng, Krishnan, Pooja, Bodenstab, Nathan, Lin, Thomas, Teng, Wenxuan, Beaulieu, Francois, Vozila, Paul

arXiv.org Artificial IntelligenceOct-7-2025

Large language models (LLMs) such as GPT-4o and o1 have demonstrated strong performance on clinical natural language processing (NLP) tasks across multiple medical benchmarks. Nonetheless, two high-impact NLP tasks - structured tabular reporting from nurse dictations and medical order extraction from doctor-patient consultations - remain underexplored due to data scarcity and sensitivity, despite active industry efforts. Practical solutions to these real-world clinical tasks can significantly reduce the documentation burden on healthcare providers, allowing greater focus on patient care. In this paper, we investigate these two challenging tasks using private and open-source clinical datasets, evaluating the performance of both open- and closed-weight LLMs, and analyzing their respective strengths and limitations. Furthermore, we propose an agentic pipeline for generating realistic, non-sensitive nurse dictations, enabling structured extraction of clinical observations. To support further research in both areas, we release SYNUR and SIMORD, the first open-source datasets for nurse observation extraction and medical order extraction.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2507.05517

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.67)
Health & Medicine > Health Care Technology (0.47)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Automatic Speech Recognition for Greek Medical Dictation

Georgilas, Vardis, Stafylakis, Themos

arXiv.org Artificial IntelligenceSep-30-2025

Medical dictation systems are essential tools in modern healthcare, enabling accurate and efficient conversion of speech into written medical documentation. The main objective of this paper is to create a domain-specific system for Greek medical speech transcriptions. The ultimate goal is to assist healthcare professionals by reducing the overload of manual documentation and improving workflow efficiency. Towards this goal, we develop a system that combines automatic speech recognition techniques with text correction model, allowing better handling of domain-specific terminology and linguistic variations in Greek. Our approach leverages both acoustic and textual modeling to create more realistic and reliable transcriptions. We focused on adapting existing language and speech technologies to the Greek medical context, addressing challenges such as complex medical terminology and linguistic inconsistencies. Through domain-specific fine-tuning, our system achieves more accurate and coherent transcriptions, contributing to the development of practical language technologies for the Greek healthcare sector.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.2355

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

StepWrite: Adaptive Planning for Speech-Driven Text Generation

Alaoui, Hamza El, Taheri, Atieh, Peng, Yi-Hao, Bigham, Jeffrey P.

arXiv.org Artificial IntelligenceAug-7-2025

People frequently use speech-to-text systems to compose short texts with voice. However, current voice-based interfaces struggle to support composing more detailed, contextually complex texts, especially in scenarios where users are on the move and cannot visually track progress. Longer-form communication, such as composing structured emails or thoughtful responses, requires persistent context tracking, structured guidance, and adaptability to evolving user intentions--capabilities that conventional dictation tools and voice assistants do not support. We introduce StepWrite, a large language model-driven voice-based interaction system that augments human writing ability by enabling structured, hands-free and eyes-free composition of longer-form texts while on the move. StepWrite decomposes the writing process into manageable subtasks and sequentially guides users with contextually-aware non-visual audio prompts. StepWrite reduces cognitive load by offloading the context-tracking and adaptive planning tasks to the models. Unlike baseline methods like standard dictation features (e.g., Microsoft Word) and conversational voice assistants (e.g., ChatGPT Advanced Voice Mode), StepWrite dynamically adapts its prompts based on the evolving context and user intent, and provides coherent guidance without compromising user autonomy. An empirical evaluation with 25 participants engaging in mobile or stationary hands-occupied activities demonstrated that StepWrite significantly reduces cognitive load, improves usability and user satisfaction compared to baseline methods. Technical evaluations further confirmed StepWrite's capability in dynamic contextual prompt generation, accurate tone alignment, and effective fact checking. This work highlights the potential of structured, context-aware voice interactions in enhancing hands-free and eye-free communication in everyday multitasking scenarios.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746059.3747610

2508.04011

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.93)
Research Report > Experimental Study > Negative Result (0.45)

Industry:

Information Technology (1.00)
Leisure & Entertainment (0.92)
Health & Medicine > Consumer Health (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

Arumugam, Guru Prakash, Chang, Shuo-yiin, Sainath, Tara N., Prabhavalkar, Rohit, Wang, Quan, Bijwadia, Shaan

arXiv.org Artificial IntelligenceDec-18-2023

ASR models often suffer from a long-form deletion problem where the model predicts sequential blanks instead of words when transcribing a lengthy audio (in the order of minutes or hours). From the perspective of a user or downstream system consuming the ASR results, this behavior can be perceived as the model "being stuck", and potentially make the product hard to use. One of the culprits for long-form deletion is training-test data mismatch, which can happen even when the model is trained on diverse and large-scale data collected from multiple application domains. In this work, we introduce a novel technique to simultaneously model different groups of speakers in the audio along with the standard transcript tokens. Speakers are grouped as primary and non-primary, which connects the application domains and significantly alleviates the long-form deletion problem. This improved model neither needs any additional training data nor incurs additional training or inference cost.

speech recognition, testset, training data, (15 more...)

arXiv.org Artificial Intelligence

2312.11123

Country: North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Toward Interactive Dictation

Li, Belinda Z., Eisner, Jason, Pauls, Adam, Thomson, Sam

arXiv.org Artificial IntelligenceJul-8-2023

Voice dictation is an increasingly important text input modality. Existing systems that allow both dictation and editing-by-voice restrict their command language to flat templates invoked by trigger words. In this work, we study the feasibility of allowing users to interrupt their dictation with spoken editing commands in open-ended natural language. We introduce a new task and dataset, TERTiUS, to experiment with such systems. To support this flexibility in real-time, a system must incrementally segment and classify spans of speech as either dictation or command, and interpret the spans that are commands. We experiment with using large pre-trained language models to predict the edited text, or alternatively, to predict a small text-editing program. Experiments show a natural trade-off between model accuracy and latency: a smaller model achieves 30% end-state accuracy with 1.3 seconds of latency, while a larger model achieves 55% end-state accuracy with 7 seconds of latency.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.04008

Country:

Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Gadget News, Latest Technology News, Tech News, Gadgets Reviews, Mobile, Tablet, Laptop, Science, Social Media, Apps, Device News, Tech Reviews

#artificialintelligenceAug-7-2022, 11:22:21 GMT

Tech giant Microsoft is rolling out a new Dictate feature to OneNote that supports AI-powered voice commands to control dictation, such as deleting text or undoing a recent step. The AI-powered voice commands can be used to format and edit text, such as deleting a word or undoing a recent step, and the platform said it plans to add more voice commands to Dictate over the coming months, reports Windows Central. "Now it is easy to break away from the keyboard and stay in the flow by using Dictate with AI-backed voice commands to add, format, edit, and organise your text," Sofia Thomas, Product Manager of Microsoft's Office Voice Team was quoted as saying. "Over the next few months, we will be adding new voice commands as well as some that are already available in other Office apps to One Note," it added. Dictate works with over 50 languages and provides an alternative way to input text within OneNote.

latest technology news, onenote, voice command, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.40)

Add feedback

Machine learning, AI can help ease the trend of physician burnout

#artificialintelligenceMar-20-2022, 22:21:05 GMT

Dr. Steven Waldren, vice president and chief informatics officer at the American Academy of Family Physicians, right, and Dr. Kamel Sadek, director of informatics at Village Medical, speak at the HIMSS22 conference in Orlando. ORLANDO, Fla. – Even before COVID-19 made the business of healthcare a nightmare for countless physicians and clinicians, burnout was a prevalent issue. And even the slow, still-ongoing emergence into normalcy hasn't been enough to ease this trend: Clerical burdens, including clinical documentation, are a major contributor. But for primary care physicians in particular, a new class of technology, including AI-powered digital assistants, is improving their capacity and capability, while reducing their administrative and cognitive burden. Dr. Steven Waldren, vice president and chief informatics officer at the American Academy of Family Physicians, cited data showing that the average patient visit to a PCP takes about 18 minutes, and of that time, 27% is dedicated to face-to-face time with a patient.

physician, physician burnout, waldren, (12 more...)

#artificialintelligence

Country: North America > United States > Florida > Orange County > Orlando (0.25)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.56)
Health & Medicine > Therapeutic Area > Immunology (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.41)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.36)

Add feedback

New Windows 11 build tests Voice Access, Spotlight backgrounds

PCWorldDec-9-2021, 09:10:42 GMT

Microsoft issued a meaty Windows Insider Build on Wednesday for the Dev channel, testing one substantial improvement: Voice Access, as well as a couple of personalization improvements that should be welcomed by Windows users. Technically, the new features offered in Build 22518 of the Dev Channel for Windows 11 are new, untested code, which might not even make it to the stable channel. Still, there's a good chance that at least Voice Access will come to market, as it leans on Microsoft's accessibility strengths. Microsoft's new build has also added a "Spotlight" feature that will provide fresh, updated desktop backgrounds, and it tweaked the Widgets feature to resemble Windows 10. Microsoft describes Voice Access as a new feature, and one that's distinct from dictation, which has been in Windows for some time.

microsoft, voice access, window 11, (12 more...)

PCWorld

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Google's own mobile chip is called Tensor

EngadgetAug-2-2021, 16:00:21 GMT

Rick Osterloh casually dropped his laptop onto the couch and leaned back, satisfied. It's not a mic, but the effect is about the same. Google's chief of hardware had just shown me a demo of the company's latest feature: computational processing for video that will debut on the Pixel 6 and Pixel 6 Pro. The feature was only possible with Google's own mobile processor, which it's announcing today. He's understandably proud and excited to share the news.

google, osterloh, tensor, (16 more...)

Engadget

Genre: Frequently Asked Questions (FAQ) (0.40)

Industry: Media (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback