AITopics | Portet, Francois

Collaborating Authors

Portet, Francois

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

Parcollet, Titouan, Nguyen, Ha, Evain, Solene, Boito, Marcely Zanon, Pupier, Adrien, Mdhaffar, Salima, Le, Hang, Alisamir, Sina, Tomashenko, Natalia, Dinarelli, Marco, Zhang, Shucong, Allauzen, Alexandre, Coavoux, Maximin, Esteve, Yannick, Rouvier, Mickael, Goulian, Jerome, Lecouteux, Benjamin, Portet, Francois, Rossato, Solange, Ringeval, Fabien, Schwab, Didier, Besacier, Laurent

arXiv.org Artificial IntelligenceSep-11-2023

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.05472

Country:

Asia (1.00)
Europe > France (0.46)
North America > United States (0.46)
(2 more...)

Genre: Research Report > Experimental Study (0.45)

Industry:

Media (1.00)
Health & Medicine (0.93)
Energy > Oil & Gas (0.67)
Energy > Power Industry (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Dynamic Time-Alignment of Dimensional Annotations of Emotion using Recurrent Neural Networks

Alisamir, Sina, Ringeval, Fabien, Portet, Francois

arXiv.org Artificial IntelligenceSep-21-2022

Most automatic emotion recognition systems exploit time-continuous annotations of emotion to provide fine-grained descriptions of spontaneous expressions as observed in real-life interactions. As emotion is rather subjective, its annotation is usually performed by several annotators who provide a trace for a given dimension, i.e. a time-continuous series describing a dimension such as arousal or valence. However, annotations of the same expression are rarely consistent between annotators, either in time or in value, which adds bias and delay in the trace that is used to learn predictive models of emotion. We therefore propose a method that can dynamically compensate inconsistencies across annotations and synchronise the traces with the corresponding acoustic features using Recurrent Neural Networks. Experimental evaluations were carried on several emotion data sets that include Chinese, French, German, and Hungarian participants who interacted remotely in either noise-free conditions or in-the-wild. The results show that our method can significantly increase inter-annotator agreement, as well as correlation between traces and audio features, for both arousal and valence. In addition, improvements are obtained in the automatic prediction of these dimensions using simple light-weight models, especially for valence in noise-free conditions, and arousal for recordings captured in-the-wild.

annotation, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2209.10223

Country:

Europe (1.00)
Asia (0.93)
North America > United States (0.69)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Evain, Solene, Nguyen, Ha, Le, Hang, Boito, Marcely Zanon, Mdhaffar, Salima, Alisamir, Sina, Tong, Ziyi, Tomashenko, Natalia, Dinarelli, Marco, Parcollet, Titouan, Allauzen, Alexandre, Esteve, Yannick, Lecouteux, Benjamin, Portet, Francois, Rossato, Solange, Ringeval, Fabien, Schwab, Didier, Besacier, Laurent

arXiv.org Artificial IntelligenceJun-10-2021

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2021-556

2104.11462

Country: Europe > France (0.30)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback