AITopics | Poli, Maxime

Collaborating Authors

Poli, Maxime

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Poli, Maxime, Chemla, Emmanuel, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceOct-30-2024

Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and non-verbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speech-only systems require up to three orders of magnitude more data to catch up to their text-based counterparts in terms of their semantic abilities. We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations, and language models trained on these units achieve comparable lexical comprehension to ones trained on hundred times more data.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.00025

Country:

North America > United States (0.46)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Introducing topography in convolutional neural networks

Poli, Maxime, Dupoux, Emmanuel, Riad, Rachid

arXiv.org Artificial IntelligenceOct-28-2022

Parts of the brain that carry sensory tasks are organized topographically: nearby neurons are responsive to the same properties of input signals. Thus, in this work, inspired by the neuroscience literature, we proposed a new topographic inductive bias in Convolutional Neural Networks (CNNs). To achieve this, we introduced a new topographic loss and an efficient implementation to topographically organize each convolutional layer of any CNN. We benchmarked our new method on 4 datasets and 3 models in vision and audio tasks and showed equivalent performance to all benchmarks. Besides, we also showcased the generalizability of our topographic loss with how it can be used with different topographic organizations in CNNs. Finally, we demonstrated that adding the topographic inductive bias made CNNs more resistant to pruning. Our approach provides a new avenue to obtain models that are more memory efficient while maintaining better accuracy.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2211.13152

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Shennong: a Python toolbox for audio speech features extraction

Bernard, Mathieu, Poli, Maxime, Karadayi, Julien, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceDec-10-2021

We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral Filterbanks or Predictive Linear Filters, pre-trained neural networks, pitch estimators as well as speaker normalization methods and post-processing algorithms. Shennong is an open source, easy-to-use, reliable and extensible framework. The use of Python makes the integration to others speech modeling and machine learning tools easy. It aims to replace or complement several heterogeneous software, such as Kaldi or Praat. After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an analysis of a Vocal Tract Length Normalization model as a function of the speech duration used for training and a comparison of pitch estimation algorithms under various noise conditions.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3758/s13428-022-02029-6

2112.05555

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.84)
Information Technology > Data Science > Data Mining > Feature Extraction (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback