AITopics | Moumen, Adel

Collaborating Authors

Moumen, Adel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels

Cuervo, Santiago, Moumen, Adel, Labrak, Yanis, Khurana, Sameer, Laurent, Antoine, Rouvier, Mickael, Marxer, Ricard

arXiv.org Artificial IntelligenceMar-8-2025

Text-Speech Language Models (TSLMs) -- language models trained to jointly process and generate text and speech -- aim to enable cross-modal knowledge transfer to overcome the scaling limitations of unimodal speech LMs. The predominant approach to TSLM training expands the vocabulary of a pre-trained text LM by appending new embeddings and linear projections for speech, followed by fine-tuning on speech data. We hypothesize that this method limits cross-modal transfer by neglecting feature compositionality, preventing text-learned functions from being fully leveraged at appropriate abstraction levels. To address this, we propose augmenting vocabulary expansion with modules that better align abstraction levels across layers. Our models, \textsc{SmolTolk}, rival or surpass state-of-the-art TSLMs trained with orders of magnitude more compute. Representation analyses and improved multimodal performance suggest our method enhances cross-modal transfer.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2503.06211

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Open-Source Conversational AI with SpeechBrain 1.0

Ravanelli, Mirco, Parcollet, Titouan, Moumen, Adel, de Langen, Sylvain, Subakan, Cem, Plantinga, Peter, Wang, Yingzhi, Mousavi, Pooneh, Della Libera, Luca, Ploujnikov, Artem, Paissan, Francesco, Borra, Davide, Zaiem, Salah, Zhao, Zeyu, Zhang, Shucong, Karakasidis, Georgios, Yeh, Sung-Lin, Champion, Pierre, Rouhe, Aku, Braun, Rudolf, Mai, Florian, Zuluaga-Gomez, Juan, Mousavi, Seyed Mahed, Nautsch, Andreas, Liu, Xuechen, Sagar, Sangeet, Duret, Jarod, Mdhaffar, Salima, Laperriere, Gaelle, Rouvier, Mickael, De Mori, Renato, Esteve, Yannick

arXiv.org Artificial IntelligenceJul-18-2024

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.00463

Country:

Europe (0.94)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Zero-Shot End-To-End Spoken Question Answering In Medical Domain

Labrak, Yanis, Moumen, Adel, Dufour, Richard, Rouvier, Mickael

arXiv.org Artificial IntelligenceJun-9-2024

In the rapidly evolving landscape of spoken question-answering (SQA), the integration of large language models (LLMs) has emerged as a transformative development. Conventional approaches often entail the use of separate models for question audio transcription and answer selection, resulting in significant resource utilization and error accumulation. To tackle these challenges, we explore the effectiveness of end-to-end (E2E) methodologies for SQA in the medical domain. Our study introduces a novel zero-shot SQA approach, compared to traditional cascade systems. Through a comprehensive evaluation conducted on a new open benchmark of 8 medical tasks and 48 hours of synthetic audio, we demonstrate that our approach requires up to 14.7 times fewer resources than a combined 1.3B parameters LLM with a 1.55B parameters ASR model while improving average accuracy by 0.5\%. These findings underscore the potential of E2E methodologies for SQA in resource-constrained contexts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.05876

Country:

Europe > France (0.15)
North America > Canada (0.14)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Stabilising and accelerating light gated recurrent units for automatic speech recognition

Moumen, Adel, Parcollet, Titouan

arXiv.org Artificial IntelligenceFeb-16-2023

Hence, the choice of the recurrent unit is of crucial interest to achieve state-of-the-art word error rates. For instance, the The light gated recurrent units (Li-GRU) is well-known for achieving light gated recurrent units (Li-GRU) [8] network has been designed impressive results in automatic speech recognition (ASR) tasks to carefully address the task of ASR. A Li-GRU is a compact singlegate while being lighter and faster to train than a standard gated recurrent unit derived from the gated recurrent units (GRU) which reduce units (GRU). However, the unbounded nature of its rectified linear by30% the per-epoch training time over a standard GRU while also unit on the candidate recurrent gate induces an important gradient improving the ASR accuracy. Nevertheless, and despite a clear interest exploding phenomenon disrupting the training process and preventing from the community, two major issues prevent a stronger adoption it from being applied to famous datasets. In this paper, we theoretically of the Li-GRU: (1) it highly suffers from exploding gradients and empirically derive the necessary conditions for its stability as the gate is unbounded; and (2) no optimized implementation exists, as well as engineering mechanisms to speed up by a factor of hence leading to much larger training times than more complex five its training time, hence introducing a novel version of this architecture alternatives such as LSTM neural networks.

artificial intelligence, li-gru, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.10144

Country: Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback