AITopics | Thomas, Samuel

Plotting

Thomas, Samuel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Non-autoregressive Model for Joint STT and TTS

Sunder, Vishal, Kingsbury, Brian, Saon, George, Thomas, Samuel, Aronowitz, Slava Shechtman Hagai, Fosler-Lussier, Eric, Lastras, Luis

arXiv.org Artificial IntelligenceJan-15-2025

In this paper, we take a step towards jointly modeling automatic speech recognition (STT) and speech synthesis (TTS) in a fully non-autoregressive way. We develop a novel multimodal framework capable of handling the speech and text modalities as input either individually or together. The proposed model can also be trained with unpaired speech or text data owing to its multimodal nature. We further propose an iterative refinement strategy to improve the STT and TTS performance of our model such that the partial hypothesis at the output can be fed back to the input of our model, thus iteratively improving both STT and TTS predictions. We show that our joint model can effectively perform both STT and TTS tasks, outperforming the STT-specific baseline in all tasks and performing competitively with the TTS-specific baseline across a wide range of evaluation metrics.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2501.09104

Country: North America > United States > Ohio (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Rouditchenko, Andrew, Khurana, Sameer, Thomas, Samuel, Feris, Rogerio, Karlinsky, Leonid, Kuehne, Hilde, Harwath, David, Kingsbury, Brian, Glass, James

arXiv.org Artificial IntelligenceMay-30-2023

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each. However, there are thousands of spoken languages worldwide, and adapting to new languages is an important problem. In this work, we aim to understand which model adapts better to languages unseen during pre-training. We fine-tune both models on 13 unseen languages and 18 seen languages. Our results show that the number of hours seen per language and language family during pre-training is predictive of how the models compare, despite the significant differences in the pre-training methods.

artificial intelligence, machine learning, unseen language, (16 more...)

arXiv.org Artificial Intelligence

2305.12606

Country:

Europe (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Rouditchenko, Andrew, Chuang, Yung-Sung, Shvetsova, Nina, Thomas, Samuel, Feris, Rogerio, Kingsbury, Brian, Karlinsky, Leonid, Harwath, David, Kuehne, Hilde, Glass, James

arXiv.org Artificial IntelligenceMay-9-2023

Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for other languages lags behind English. We propose a Cross-Lingual Cross-Modal Knowledge Distillation method to improve multilingual text-video retrieval. Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English. We propose a cross entropy based objective which forces the distribution over the student's text-video similarity scores to be similar to those of the teacher models. We introduce a new multilingual video dataset, Multi-YouCook2, by translating the English captions in the YouCook2 video dataset to 8 other languages. Our method improves multilingual text-video retrieval performance on Multi-YouCook2 and several other datasets such as Multi-MSRVTT and VATEX. We also conducted an analysis on the effectiveness of different multilingual text models as teachers. The code, models, and dataset are available at https://github.com/roudimit/c2kd.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.03625

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Communications (0.68)

Add feedback

SimplerVoice: A Key Message & Visual Description Generator System for Illiteracy

Nguyen, Minh N. B., Thomas, Samuel, Gattiker, Anne E., Kashyap, Sujatha, Varshney, Kush R.

arXiv.org Artificial IntelligenceNov-3-2018

We introduce SimplerVoice: a key message and visual description generator system to help low-literate adults navigate the information-dense world with confidence, on their own. SimplerVoice can automatically generate sensible sentences describing an unknown object, extract semantic meanings of the object usage in the form of a query string, then, represent the string as multiple types of visual guidance (pictures, pictographs, etc.). We demonstrate SimplerVoice system in a case study of generating grocery products' manuals through a mobile application. To evaluate, we conducted a user study on SimplerVoice's generated description in comparison to the information interpreted by users from other methods: the original product package and search engines' top result, in which SimplerVoice achieved the highest performance score: 4.82 on 5-point mean opinion score scale. Our result shows that SimplerVoice is able to provide low-literate end-users with simple yet informative components to help them understand how to use the grocery products, and that the system may potentially provide benefits in other real-world use cases.

information management, simplervoice, survey article, (18 more...)

arXiv.org Artificial Intelligence

1811.01299

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management > Search (0.90)
Information Technology > Communications (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Invariant Representations for Noisy Speech Recognition

Serdyuk, Dmitriy, Audhkhasi, Kartik, Brakel, Philémon, Ramabhadran, Bhuvana, Thomas, Samuel, Bengio, Yoshua

arXiv.org Machine LearningNov-27-2016

Modern automatic speech recognition (ASR) systems need to be robust under acoustic variability arising from environmental, speaker, channel, and recording conditions. Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training. We attempt to address this problem by encouraging the neural network acoustic model to learn invariant feature representations. We use ideas from recent research on image generation using Generative Adversarial Networks and domain adaptation ideas extending adversarial gradient-based training. A recent work from Ganin et al. proposes to use adversarial training for image domain adaptation by using an intermediate representation from the main target classification network to deteriorate the domain classifier performance through a separate neural network. Our work focuses on investigating neural architectures which produce representations invariant to noise conditions for ASR. We evaluate the proposed architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We show that our method generalizes better than the standard multi-condition training especially when only a few noise categories are seen during training.

deep learning, noise condition, speech recognition, (17 more...)

arXiv.org Machine Learning

1612.01928

Country:

North America (0.15)
Europe > Spain (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Compiling Constraint Networks into Multivalued Decomposable Decision Graphs

Koriche, Frédéric (CRIL-CNRS and Université d'Artois) | Lagniez, Jean-Marie (CRIL-CNRS and Université d'Artois) | Marquis, Pierre (CRIL-CNRS and Université d'Artois) | Thomas, Samuel (CRIL-CNRS and Université d'Artois)

AAAI ConferencesJul-15-2015

Specifically, we present a top-down algorithm cn2mddg for compiling finite-domain CNs into multivalued decomposable We present and evaluate a top-down algorithm for decision graphs. The input of cn2mddg is a CN compiling finite-domain constraint networks (CNs) represented in the XCSP 2.1 format [Roussel and Lecoutre, into the language MDDG of multivalued decomposable 2009]. The output of our compilation algorithm is a representation decision graphs. Though it includes Decision-of the solutions of the CN in the language MDDG DNNF as a proper subset, MDDG offers the same key of multivalued decomposable decision graphs. MDDG is precisely tractable queries and transformations as Decision-the extension to non-Boolean domains of the language DNNF, which makes it useful for many applications. DDG [Fargier and Marquis, 2006] also known as Decision-Intensive experiments showed that our compiler DNNF [Oztok and Darwiche, 2014]: it is based on decomposable cn2mddg succeeds in compiling CNs which -nodes and (multivalued) decision nodes. Similarly are out of the reach of standard approaches based to Decision-DNNF, the MDDG language offers a number of on a translation of the input network to CNF, followed tractable queries, including (possibly weighted) solution finding by a compilation to Decision-DNNF. Furthermore, and counting, solution enumeration (solutions can be enumerated the sizes of the resulting compiled representations with polynomial delay), and optimization w.r.t. a linear turn out to be much smaller (sometimes by objective function. It also offers tractable transformations, several orders of magnitude).

artificial intelligence, constraint-based reasoning, representation, (16 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Add feedback

Knowledge Compilation for Model Counting: Affine Decision Trees

Koriche, Frédéric (CRIL-CNRS and Université d'Artois) | Lagniez, Jean-Marie (FMV, Johannes Kepler University) | Marquis, Pierre (CRIL-CNRS and Université d'Artois) | Thomas, Samuel (CRIL-CNRS and Université d'Artois)

AAAI ConferencesAug-3-2013

Counting the models of a propositional formula is a key issue for a number of AI problems, but few propositional languages offer the possibility to count models efficiently. In order to fill the gap, we introduce the language EADT of (extended) affine decision trees. An extended affine decision tree simply is a tree with affine decision nodes and some specific decomposable conjunction or disjunction nodes. Unlike standard decision trees, the decision nodes of an EADT formula are not labeled by variables but by affine clauses. We study EADT, and several subsets of it along the lines of the knowledge compilation map. We also describe a CNF-to-EADT compiler and present some experimental results. Those results show that the EADT compilation-based approach is competitive with (and in some cases is able to outperform) the model counter Cachet and the d-DNNF compilation-based approach to model counting.

affine decision tree, knowledge compilation

AAAI Conferences

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback