AITopics | Tahon, Marie

Plotting

Tahon, Marie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Predefined Prototypes for Intra-Class Separation and Disentanglement

Almudévar, Antonio, Mariotte, Théo, Ortega, Alfonso, Tahon, Marie, Vicente, Luis, Miguel, Antonio, Lleida, Eduardo

arXiv.org Artificial IntelligenceJun-23-2024

It is possible to associate some concrete dimensions of these representations with concrete human-understandable features Prototypical Learning is based on the idea that there is a point so that a change of a feature produces changes in only a few (which we call prototype) around which the embeddings of a dimensions of the space. This is has some advantages such as class are clustered. It has shown promising results in scenarios (i) having more control over data creation in generative models with little labeled data or to design explainable models. Typically, [8], or (ii) providing the ability to explain and interpret prototypes are either defined as the average of the embeddings model predictions [9]. of a class or are designed to be trainable. In this work, In this paper we propose a modification on the prototypical we propose to predefine prototypes following human-specified systems that preserves their default advantages and, in addition, criteria, which simplify the training pipeline and brings different allows solving the two problems presented.

artificial intelligence, machine learning, prototype, (16 more...)

arXiv.org Artificial Intelligence

2406.16145

Country: Europe > Spain > Aragón (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

Uro, Rémi, Doukhan, David, Rilliard, Albert, Larcher, Laëtitia, Adgharouamane, Anissa-Claire, Tahon, Marie, Laurent, Antoine

arXiv.org Artificial IntelligenceApr-26-2024

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method shows promise for creating large corpora of known target speakers.

artificial intelligence, machine learning, speech recognition, (19 more...)

arXiv.org Artificial Intelligence

2404.17552

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (1.00)
Media > Radio (0.46)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

Almudévar, Antonio, Mariotte, Théo, Ortega, Alfonso, Tahon, Marie

arXiv.org Artificial IntelligenceJan-18-2024

Unsupervised Multiple Domain Translation is the task of transforming data from one domain to other domains without having paired data to train the systems. Typically, methods based on Generative Adversarial Networks (GANs) are used to address this task. However, our proposal exclusively relies on a modified version of a Variational Autoencoder. This modification consists of the use of two latent variables disentangled in a controlled way by design. One of this latent variables is imposed to depend exclusively on the domain, while the other one must depend on the rest of the variability factors of the data. Additionally, the conditions imposed over the domain latent variable allow for better control and understanding of the latent space. We empirically demonstrate that our approach works on different vision datasets improving the performance of other well known methods. Finally, we prove that, indeed, one of the latent variables stores all the information related to the domain and the other one hardly contains any domain information.

artificial intelligence, machine learning, translation, (14 more...)

arXiv.org Artificial Intelligence

2401.0918

Country: Europe > Spain > Aragón (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

An Explainable Proxy Model for Multiabel Audio Segmentation

Mariotte, Théo, Almudévar, Antonio, Tahon, Marie, Ortega, Alfonso

arXiv.org Artificial IntelligenceJan-17-2024

Audio signal segmentation is a key task for automatic audio indexing. It consists of detecting the boundaries of class-homogeneous segments in the signal. In many applications, explainable AI is a vital process for transparency of decision-making with machine learning. In this paper, we propose an explainable multilabel segmentation model that solves speech activity (SAD), music (MD), noise (ND), and overlapped speech detection (OSD) simultaneously. This proxy uses the non-negative matrix factorization (NMF) to map the embedding used for the segmentation to the frequency domain. Experiments conducted on two datasets show similar performances as the pre-trained black box model while showing strong explainability features. Specifically, the frequency bins used for the decision can be easily identified at both the segment level (local explanations) and global level (class prototypes).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.08268

Country:

Europe (0.69)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations

Macary, Manon, Tahon, Marie, Estève, Yannick, Luzzati, Daniel

arXiv.org Artificial IntelligenceOct-6-2023

The goal of our research is to automatically retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Moreover, several studies have pointed out that emotion can be detected not only in speech but also in facial trait, in biological response or in textual information. In the context of telephone conversations, we can break down the audio information into acoustic and linguistic by using the speech signal and its transcription. Our experiments confirms the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction and best generalizes to unseen data. Our experiments conclude to the definitive advantage of using CamemBERT representations, however the benefit of the fusion of acoustic and linguistic modalities is not as obvious. With models learnt on individual annotations, we found that fusion approaches are more robust to the subjectivity of the annotation task. This study also tackles the problem of performances variability and intends to estimate this variability from different views: weights initialization, confidence intervals and annotation subjectivity. A deep analysis on the linguistic content investigates interpretable factors able to explain the high contribution of the linguistic modality for this task.

machine learning, natural language, recognition, (21 more...)

arXiv.org Artificial Intelligence

2310.04481

Country:

Asia > India (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Lebourdais, Martin, Mariotte, Théo, Tahon, Marie, Larcher, Anthony, Laurent, Antoine, Montresor, Silvio, Meignier, Sylvain, Thomas, Jean-Hugh

arXiv.org Artificial IntelligenceJul-24-2023

VAD and OSD) are key pre-processing tasks for speaker diarization. In this paper, we propose two 2-class VAD and OSD and 3-The final segmentation performance highly relies on class VAD+OSD for mono and multi-channel signals. We evaluate the robustness of these sub-tasks. Recent studies have shown how beneficial is the 3-class approach in comparison to the VAD and OSD can be trained jointly using a multi-class classification use of two independent VAD and OSD models in terms of F1-model. However, these works are often restricted to a score and training resources. Each system is trained and evaluated specific speech domain, lacking information about the generalization on four different datasets covering various speech domains capacities of the systems. This paper proposes a complete including both single and multiple microphone scenarios. To and new benchmark of different VAD and OSD models, the best of our knowledge, no benchmark has been conducted on multiple audio setups (single/multi-channel) and speech domains on these approaches across various speech domains and recording (e.g.

artificial intelligence, machine learning, speech domain, (17 more...)

arXiv.org Artificial Intelligence

2307.13012

Country: Europe > France (0.28)

Genre: Research Report (0.70)

Industry: Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Evaluation of Speaker Anonymization on Emotional Speech

Nourtel, Hubert, Champion, Pierre, Jouvet, Denis, Larcher, Anthony, Tahon, Marie

arXiv.org Artificial IntelligenceApr-15-2023

Speech data carries a range of personal information, such as the speaker's identity and emotional state. These attributes can be used for malicious purposes. With the development of virtual assistants, a new generation of privacy threats has emerged. Current studies have addressed the topic of preserving speech privacy. One of them, the VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology. The task selected for the VoicePrivacy 2020 Challenge (VPC) is about speaker anonymization. The goal is to hide the source speaker's identity while preserving the linguistic information. The baseline of the VPC makes use of a voice conversion. This paper studies the impact of the speaker anonymization baseline system of the VPC on emotional information present in speech utterances. Evaluation is performed following the VPC rules regarding the attackers' knowledge about the anonymization system. Our results show that the VPC baseline system does not suppress speakers' emotions against informed attackers. When comparing anonymized speech to original speech, the emotion recognition performance is degraded by 15\% relative to IEMOCAP data, similar to the degradation observed for automatic speech recognition used to evaluate the preservation of the linguistic information.

artificial intelligence, machine learning, speech, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/SPSC.2021-13

2305.01759

Country:

Europe > France (0.14)
Europe > Portugal (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback