AITopics | Calapodescu, Ioan

Collaborating Authors

Calapodescu, Ioan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection

Lee, Beomseok, Gaido, Marco, Calapodescu, Ioan, Besacier, Laurent, Negri, Matteo

arXiv.org Artificial IntelligenceDec-16-2024

As in any data-intensive domain, collecting highquality To fill this gap, this paper explores the use datasets is a fundamental and costly prerequisite of SFMs to automatize the validation of crowdsourced for the development of speech-processing speech data. To this aim, we investigate the applications. Traditional methods heavily rely on employment of off-the-shelf SFMs such as Whisper human workforce, whose costs, as data collection and SeamlessM4T (Radford et al., 2022; Communication scales, are hard to sustain. In the quest for scalable et al., 2023), along with machine translation solutions to tackle this problem, crowdsourcing (MT) models and grapheme-to-phoneme conversion emerged as a viable option that also enables the coverage (G2P). Through experiments on French, of diverse populations (Cefkin et al., 2014; German, and Korean data, we test the integration Poesio et al., 2017). Due to the variable quality of of SFMs and crowdsourcing to reduce validation crowd-sourced data, validation methods that discard costs while preserving final data quality. Our results low-quality contributions are essential to build show that leveraging SFMs yields a cost reduction reliable datasets (Negri et al., 2011; Sabou et al., by over 40%, while maintaining high data quality, 2014; Chittilappilly et al., 2016). This need is exacerbated significantly improving the efficiency and scalability in the collection of speech-text pairs, where of crowd-sourced speech data collection.

artificial intelligence, data quality, social media, (16 more...)

arXiv.org Artificial Intelligence

2412.11978

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)

Add feedback

mHuBERT-147: A Compact Multilingual HuBERT Model

Boito, Marcely Zanon, Iyer, Vivek, Lagos, Nikolaos, Besacier, Laurent, Calapodescu, Ioan

arXiv.org Artificial IntelligenceJun-27-2024

We present mHuBERT-147, the first general-purpose massively multilingual HuBERT speech representation model trained on 90K hours of clean, open-license data. To scale up the multi-iteration HuBERT approach, we use faiss-based clustering, achieving 5.2x faster label assignment than the original method. We also apply a new multilingual batching up-sampling strategy, leveraging both language and dataset diversity. After 3 training iterations, our compact 95M parameter mHuBERT-147 outperforms larger models trained on substantially more data. We rank second and first on the ML-SUPERB 10min and 1h leaderboards, with SOTA scores for 3 tasks. Across ASR/LID tasks, our model consistently surpasses XLS-R (300M params; 436K hours) and demonstrates strong competitiveness against the much larger MMS (1B params; 491K hours). Our findings indicate that mHuBERT-147 is a promising model for multilingual speech tasks, offering an unprecedented balance between high performance and parameter efficiency.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.06371

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks

Suresh, Varsha, Aït-Mokhtar, Salah, Brun, Caroline, Calapodescu, Ioan

arXiv.org Artificial IntelligenceJun-20-2024

Self-supervised learning models have revolutionized the field of speech processing. However, the process of fine-tuning these models on downstream tasks requires substantial computational resources, particularly when dealing with multiple speech-processing tasks. In this paper, we explore the potential of adapter-based fine-tuning in developing a unified model capable of effectively handling multiple spoken language processing tasks. The tasks we investigate are Automatic Speech Recognition, Phoneme Recognition, Intent Classification, Slot Filling, and Spoken Emotion Recognition. We validate our approach through a series of experiments on the SUPERB benchmark, and our results indicate that adapter-based fine-tuning enables a single encoder-decoder model to perform multiple speech processing tasks with an average improvement of 18.4% across the five target tasks while staying efficient in terms of parameter updates.

artificial intelligence, machine translation, natural language, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP48485.2024.10448240

2406.14747

Country: Europe (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Add feedback

A Textless Metric for Speech-to-Speech Comparison

Besacier, Laurent, Ribeiro, Swen, Galibert, Olivier, Calapodescu, Ioan

arXiv.org Artificial IntelligenceJul-20-2023

In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely corresponds to its text-based counterpart. This textless metric has numerous potential applications, including evaluating speech-to-speech translation for oral languages, languages without dependable ASR systems, or to avoid the need for ASR transcription altogether. This paper also shows that for speech-to-speech translation evaluation, ASR-BLEU (which consists in automatically transcribing both speech hypothesis and reference and compute sentence-level BLEU between transcripts) is a poor proxy to real text-BLEU even when ASR system is strong.

machine learning, natural language, utterance, (21 more...)

arXiv.org Artificial Intelligence

2210.11835

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)

Add feedback

NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

Gow-Smith, Edward, Berard, Alexandre, Boito, Marcely Zanon, Calapodescu, Ioan

arXiv.org Artificial IntelligenceJun-13-2023

This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track. Our work attempts to maximize translation quality in low-resource settings using multilingual parameter-efficient solutions that leverage strong pre-trained models. Our primary submission for Tamasheq outperforms the previous state of the art by 7.5 BLEU points on the IWSLT 2022 test set, and achieves 23.6 BLEU on this year's test set, outperforming the second best participant by 7.7 points. For Quechua, we also rank first and achieve 17.7 BLEU, despite having only two hours of translation data. Finally, we show that our proposed multilingual architecture is also competitive for high-resource languages, outperforming the best unconstrained submission to the IWSLT 2021 Multilingual track, despite using much less training data and compute.

artificial intelligence, natural language, submission, (19 more...)

arXiv.org Artificial Intelligence

2306.07763

Country:

Africa > Niger (0.16)
Europe > France (0.14)
Asia > Thailand (0.14)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback