AITopics | Rouvier, Mickael

Collaborating Authors

Rouvier, Mickael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

Parcollet, Titouan, Nguyen, Ha, Evain, Solene, Boito, Marcely Zanon, Pupier, Adrien, Mdhaffar, Salima, Le, Hang, Alisamir, Sina, Tomashenko, Natalia, Dinarelli, Marco, Zhang, Shucong, Allauzen, Alexandre, Coavoux, Maximin, Esteve, Yannick, Rouvier, Mickael, Goulian, Jerome, Lecouteux, Benjamin, Portet, Francois, Rossato, Solange, Ringeval, Fabien, Schwab, Didier, Besacier, Laurent

arXiv.org Artificial IntelligenceSep-11-2023

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.05472

Country:

Asia (1.00)
Europe > France (0.46)
North America > United States (0.46)
(2 more...)

Genre: Research Report > Experimental Study (0.45)

Industry:

Media (1.00)
Health & Medicine (0.93)
Energy > Oil & Gas (0.67)
Energy > Power Industry (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

Labrak, Yanis, Rouvier, Mickael, Dufour, Richard

arXiv.org Artificial IntelligenceJul-22-2023

We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance of state-of-the-art models in zero- and few-shot scenarios for most tasks, and particularly well for the QA task, even though they have never seen examples from these tasks before. However, we observed that the classification and RE tasks perform below what can be achieved with a specifically trained model for the medical field, such as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all the studied tasks, with some models being better suited for certain tasks than others.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.12114

Country:

Europe (1.00)
Asia > Middle East (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.54)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains

Labrak, Yanis, Bazoge, Adrien, Dufour, Richard, Rouvier, Mickael, Morin, Emmanuel, Daille, Béatrice, Gourraud, Pierre-Antoine

arXiv.org Artificial IntelligenceMay-4-2023

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains. In this paper, we propose an original study of PLMs in the medical domain on French language. We compare, for the first time, the performance of PLMs trained on both public data from the web and private data from healthcare establishments. We also evaluate different learning strategies on a set of biomedical tasks. In particular, we show that we can take advantage of already existing biomedical PLMs in a foreign language by further pre-train it on our targeted data. Finally, we release the first specialized PLMs for the biomedical field in French, called DrBERT, as well as the largest corpus of medical data under free license on which these models are trained.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.00958

Country: Europe > France (0.29)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.88)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.48)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

Labrak, Yanis, Bazoge, Adrien, Dufour, Richard, Rouvier, Mickael, Morin, Emmanuel, Daille, Béatrice, Gourraud, Pierre-Antoine

arXiv.org Artificial IntelligenceApr-9-2023

This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual correction(s). We also propose first baseline models to automatically process this MCQA task in order to report on the current performances and to highlight the difficulty of the task. A detailed analysis of the results showed that it is necessary to have representations adapted to the medical domain or to the MCQA task: in our case, English specialized models yielded better results than generic French ones, even though FrenchMedMCQA is in French. Corpus, models and tools are available online.

artificial intelligence, natural language, question answering, (16 more...)

arXiv.org Artificial Intelligence

2304.0428

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)

Add feedback