AITopics | long-form recording

Collaborating Authors

long-form recording

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

Charlot, Théo, Kunze, Tarek, Poli, Maxime, Cristia, Alejandrina, Dupoux, Emmanuel, Lavechin, Marvin

arXiv.org Artificial IntelligenceSep-19-2025

Child-centered long-form recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic differences. We introduce BabyHuBERT, the first self-supervised speech representation model trained on 13,000 hours of multilingual child-centered long-form recordings spanning over 40 languages. We evaluate BabyHuBERT on speaker segmentation, identifying when target children speak versus female adults, male adults, or other children -- a fundamental preprocessing step for analyzing naturalistic language experiences. BabyHuBERT achieves F1-scores from 52.1% to 74.4% across six diverse datasets, consistently outperforming W2V2-LL4300 (trained on English long-forms) and standard HuBERT (trained on clean adult speech). Notable improvements include 13.2 absolute F1 points over HuBERT on Vanuatu and 15.9 points on Solomon Islands corpora, demonstrating effectiveness on underrepresented languages. By sharing code and models, BabyHuBERT serves as a foundation model for child speech research, enabling fine-tuning on diverse downstream tasks.

artificial intelligence, long-form recording, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.15001

Country:

Europe (0.69)
North America > United States (0.29)
Oceania > Solomon Islands (0.25)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.41)

Add feedback

Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity

Peurey, Loann, Lavechin, Marvin, Kunze, Tarek, Khentout, Manel, Gautheron, Lucas, Dupoux, Emmanuel, Cristia, Alejandrina

arXiv.org Artificial IntelligenceSep-3-2025

Audio-recordings collected with a child-worn device are a fundamental tool in child language research. Long-form recordings collected over whole days promise to capture children's input and production with minimal observer bias, and therefore high validity. The sheer volume of resulting data necessitates automated analysis to extract relevant metrics for researchers and clinicians. This paper summarizes collective knowledge on this technique, providing entry points to existing resources. We also highlight various sources of error that threaten the accuracy of automated annotations and the interpretation of resulting metrics. To address this, we propose potential troubleshooting metrics to help users assess data quality. While a fully automated quality control system is not feasible, we outline practical strategies for researchers to improve data collection and contextualize their analyses.

artificial intelligence, data mining, dataset, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-1987

2506.11075

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining (0.35)

Add feedback

Challenges in Automated Processing of Speech from Child Wearables: The Case of Voice Type Classifier

Kunze, Tarek, Métais, Marianne, Titeux, Hadrien, Elbert, Lucas, Coffey, Joseph, Dupoux, Emmanuel, Cristia, Alejandrina, Lavechin, Marvin

arXiv.org Artificial IntelligenceSep-3-2025

Recordings gathered with child-worn devices promised to revolutionize both fundamental and applied speech sciences by allowing the effortless capture of children's naturalistic speech environment and language production. This promise hinges on speech technologies that can transform the sheer mounds of data thus collected into usable information. This paper demonstrates several obstacles blocking progress by summarizing three years' worth of experiments aimed at improving one fundamental task: Voice Type Classification. Our experiments suggest that improvements in representation features, architecture, and parameter search contribute to only marginal gains in performance. More progress is made by focusing on data relevance and quantity, which highlights the importance of collecting data with appropriate permissions to allow sharing.

artificial intelligence, machine learning, whisper-vtc, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-1962

2506.11074

Country: North America (0.28)

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Speech (0.68)

Add feedback