AITopics | Wisniewski, Guillaume

Collaborating Authors

Wisniewski, Guillaume

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Systematic Comparison of Syntactic Representations of Dependency Parsing

Wisniewski, Guillaume, Lacroix, Ophélie

arXiv.org Artificial IntelligenceMar-10-2025

We compare the performance of a transition-based parser in regards to different annotation schemes. We pro-pose to convert some specific syntactic constructions observed in the universal dependency treebanks into a so-called more standard representation and to evaluate parsing performances over all the languages of the project. We show that the ``standard'' constructions do not lead systematically to better parsing performance and that the scores vary considerably according to the languages.

artificial intelligence, dependency, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.07142

Country:

Europe (1.00)
North America > United States > Oregon (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models

Fily, Maxime, Wisniewski, Guillaume, Guillaume, Severine, Adda, Gilles, Michaud, Alexis

arXiv.org Artificial IntelligenceFeb-8-2024

In the highly constrained context of low-resource language studies, we explore vector representations of speech from a pretrained model to determine their level of abstraction with regard to the audio signal. We propose a new unsupervised method using ABX tests on audio recordings with carefully curated metadata to shed light on the type of information present in the representations. ABX tests determine whether the representations computed by a multilingual speech model encode a given characteristic. Three experiments are devised: one on room acoustics aspects, one on linguistic genre, and one on phonetic aspects. The results confirm that the representations extracted from recordings with different linguistic/extra-linguistic characteristics differ along the same lines. Embedding more audio signal in one vector better discriminates extra-linguistic characteristics, whereas shorter snippets are better to distinguish segmental information. The method is fully unsupervised, potentially opening new research avenues for comparative work on under-documented languages.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2402.05581

Country: Europe > France (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.71)
Media > Music (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)

Add feedback

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

Conti, Lina, Wisniewski, Guillaume

arXiv.org Artificial IntelligenceOct-24-2023

Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.15852

Country: Europe > France (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

ProsAudit, a prosodic benchmark for self-supervised speech models

de Seyssel, Maureen, Lavechin, Marvin, Titeux, Hadrien, Thomas, Arthur, Virlet, Gwendal, Revilla, Andrea Santos, Wisniewski, Guillaume, Ludusan, Bogdan, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceJun-1-2023

We present ProsAudit, a benchmark in English to assess structural prosodic knowledge in self-supervised learning (SSL) speech models. It consists of two subtasks, their corresponding metrics, and an evaluation dataset. In the protosyntax task, the model must correctly identify strong versus weak prosodic boundaries. In the lexical task, the model needs to correctly distinguish between pauses inserted between words and within words. We also provide human evaluation scores on this benchmark. We evaluated a series of SSL models and found that they were all able to perform above chance on both tasks, even when evaluated on an unseen language. However, non-native models performed significantly worse than native ones on the lexical task, highlighting the importance of lexical knowledge in this task. We also found a clear effect of size with models trained on more data performing better in the two subtasks.

benchmark, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2302.12057

Country: Europe (0.29)

Genre: Research Report (0.64)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback

From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape

Guillaume, Séverine, Wisniewski, Guillaume, Michaud, Alexis

arXiv.org Artificial IntelligenceMay-29-2023

XLSR-53 a multilingual model of speech, builds a vector representation from audio, which allows for a range of computational treatments. The experiments reported here use this neural representation to estimate the degree of closeness between audio files, ultimately aiming to extract relevant linguistic properties. We use max-pooling to aggregate the neural representations from a "snippet-lect" (the speech in a 5-second audio snippet) to a "doculect" (the speech in a given resource), then to dialects and languages. We use data from corpora of 11 dialects belonging to 5 less-studied languages. Similarity measurements between the 11 corpora bring out greatest closeness between those that are known to be dialects of the same language. The findings suggest that (i) dialect/language can emerge among the various parameters characterizing audio files and (ii) estimates of overall phonetic/phonological closeness can be obtained for a little-resourced or fully unknown language. The findings help shed light on the type of information captured by neural representations of speech and how it can be extracted from these representations

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.18602

Country: Europe (1.00)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)

Add feedback

Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement

Li, Bingzhi, Wisniewski, Guillaume, Crabbé, Benoît

arXiv.org Artificial IntelligenceJan-4-2023

The long-distance agreement, evidence for syntactic structure, is increasingly used to assess the syntactic generalization of Neural Language Models. Much work has shown that transformers are capable of high accuracy in varied agreement tasks, but the mechanisms by which the models accomplish this behavior are still not well understood. To better understand transformers' internal working, this work contrasts how they handle two superficially similar but theoretically distinct agreement phenomena: subject-verb and object-past participle agreement in French. Using probing and counterfactual analysis methods, our experiments show that i) the agreement task suffers from several confounders which partially question the conclusions drawn so far and ii) transformers handle subject-verb and object-past participle agreements in a way that is consistent with their modeling in theoretical linguistics.

abstract syntactic representation, artificial intelligence, natural language, (3 more...)

arXiv.org Artificial Intelligence

2212.04523

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)

Add feedback

User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

Adams, Oliver, Galliot, Benjamin, Wisniewski, Guillaume, Lambourne, Nicholas, Foley, Ben, Sanders-Dwyer, Rahasya, Wiles, Janet, Michaud, Alexis, Guillaume, Séverine, Besacier, Laurent, Cox, Christopher, Aplonova, Katya, Jacques, Guillaume, Hill, Nathan

arXiv.org Artificial IntelligenceFeb-22-2021

This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit. The goal of this work is to make end-to-end speech recognition models available to language workers via a user-friendly graphical interface. Encouraging results are reported on (i) development of an ESPnet recipe for use in Elpis, with preliminary results on data sets previously used for training acoustic models with the Persephone toolkit along with a new data set that had not previously been used in speech recognition, and (ii) incorporating ESPnet into Elpis along with UI enhancements and a CUDA-supported Dockerfile.

artificial intelligence, elpis, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2101.03027

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback