AITopics | transcription

Collaborating Authors

transcription

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Melbourne psychiatrist refuses new patients who don't consent to AI note-taking

The GuardianMay-18-2026, 15:00:04 GMT

Digital rights experts have raised concerns about the security of the data recorded by AI in psychiatrists' sessions. Digital rights experts have raised concerns about the security of the data recorded by AI in psychiatrists' sessions. Melbourne psychiatrist refuses new patients who don't consent to AI note-taking A Melbourne psychiatrist has refused new patients unless they agree to allow her to use an AI scribe to transcribe the conversations in their sessions. AI-driven note taking tools are becoming popular within the medical industry - with two in five general practitioners now using such scribes, according to the Royal Australian College of General Practitioners (RACGP). But there have also been concerns about the security of the data and how it might be used by the AI companies, along with the accuracy of the transcriptions.

artificial intelligence, psychiatrist, social media, (11 more...)

The Guardian

Country: Oceania > Australia (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.74)

Add feedback

M4Singer: AMulti-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus

Neural Information Processing SystemsApr-25-2026, 07:07:01 GMT

The lack of publicly available high-quality and accurately labeled datasets has long been a major bottleneck for singing voice synthesis (SVS). To tackle this problem, we present M4Singer, a free-to-use Multi-style, Multi-singer Mandarin singing collection with elaborately annotated Musical scores as well as its benchmarks. Specifically, 1) we construct and release a large high-quality Chinese singing voice corpus, which is recorded by 20 professional singers, covering 700 Chinese pop songs as well as all the four SATB types (i.e., soprano, alto, tenor, and bass); 2) we take extensive efforts to manually compose the musical scores for each recorded song, which is necessary to the study of the prosody modeling for SVS. 3) To facilitate the use and demonstrate the quality of M4Singer, we conduct four different benchmark experiments: score-based SVS, controllable singing voice (CSV), singing voice conversion (SVC) and automatic music transcription (AMT). Audio samples can be found at http://m4singer.github.io.

m4singer, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Unsupervised Learning of Spoken Language with Visual Context

David Harwath, Antonio Torralba, James Glass

Neural Information Processing SystemsMar-23-2026, 14:48:15 GMT

Humans learn to speak before they can read or write, so why can't computers do the same? In this paper, we present a deep neural network model capable of rudimentary spoken language acquisition using untranscribed audio training data, whose only supervision comes in the form of contextually relevant visual images. We describe the collection of our data comprised of over 120,000 spoken audio captions for the Places image dataset and evaluate our model on an image search and annotation task. We also provide some visualizations which suggest that our model is learning to recognize meaningful words within the caption spectrograms.

caption, machine learning, pattern recognition, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.35)

Add feedback

Optimal spectral transportation with application to music transcription

Rémi Flamary, Cédric Févotte, Nicolas Courty, Valentin Emiya

Neural Information Processing SystemsMar-23-2026, 06:18:38 GMT

Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates.

artificial intelligence, frequency, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Industry:

Media > Music (0.68)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Neural Information Processing SystemsMar-22-2026, 19:09:42 GMT

Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is a segmental structure segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance.

artificial intelligence, proceedings, speech recognition, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.96)

Add feedback

e99ed1162e984a5f08cb57ecde2d2231-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 13:31:28 GMT

machine learning, natural language, segmentation model, (15 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

Model Details

Neural Information Processing SystemsFeb-18-2026, 04:22:50 GMT

We decreased the confidence threshold to 0.1 to increase article and headline The following specifications were used: { resolution: 256, learning rate: 2e-3 }. This limit is binding for common words, e.g., "the". The recognizer is trained using the Supervised Contrastive ("SupCon") loss function [7], a gener-45 In particular, we work with the "outside" SupCon loss formulation We use a MobileNetV3 (Small) encoder pre-trained on ImageNet1k sourced from the timm [19] We use 0.1 as the temperature for Center Cropping, to avoid destroying too much information. C (Small) model that is developed in [2] for character recognition. If multiple article bounding boxes satisfy these rules for a given headline, then we take the highest.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Netherlands > South Holland > Leiden (0.04)

Industry:

Law (1.00)
Information Technology (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Neural Information Processing SystemsFeb-16-2026, 17:10:54 GMT

We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks.

artificial intelligence, arxiv preprint arxiv, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

A file format used in the

Neural Information Processing SystemsFeb-15-2026, 15:14:35 GMT

The keywords were extracted using the procedure described in SectionC. The restricted part of the Muharaf dataset has 428 images distributed under a proprietary license.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Industry:

Law (0.97)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Muharaf: Manuscripts of Handwritten Arabic Dataset for Cursive Text Recognition

Neural Information Processing SystemsFeb-15-2026, 15:14:33 GMT

We present the Manuscripts of Handwritten Arabic (Muharaf) dataset, which is a machine learning dataset consisting of more than 1,600 historic handwritten page images transcribed by experts in archival Arabic.

machine learning, natural language, pattern recognition, (17 more...)

Neural Information Processing Systems

Country: