AITopics | Mai, Florian

Collaborating Authors

Mai, Florian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Superalignment with Dynamic Human Values

Mai, Florian, Kaczér, David, Corrêa, Nicholas Kluge, Flek, Lucie

arXiv.org Artificial IntelligenceMar-17-2025

Two core challenges of alignment are 1) scalable oversight and 2) accounting for the dynamic nature of human values. While solutions like recursive reward modeling address 1), they do not simultaneously account for 2). We sketch a roadmap for a novel algorithmic framework that trains a superhuman reasoning model to decompose complex tasks into subtasks that are still amenable to human-level guidance. Our approach relies on what we call the part-to-complete generalization hypothesis, which states that the alignment of subtask solutions generalizes to the alignment of complete solutions. We advocate for the need to measure this generalization and propose ways to improve it in the future.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.13621

Country:

Europe > Germany > North Rhine-Westphalia (0.14)
Asia > Thailand (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

End-to-end Planner Training for Language Modeling

Cornille, Nathan, Mai, Florian, Sun, Jingyuan, Moens, Marie-Francine

arXiv.org Artificial IntelligenceOct-16-2024

Through end-to-end training to predict the next token, LLMs have become valuable tools for various tasks. Enhancing their core training in language modeling can improve numerous downstream applications. A successful approach to enhance language modeling uses a separate planning module to predict abstract labels of future sentences and conditions the LM on these predictions. However, this method is non-differentiable, preventing joint end-to-end tuning of the planner with the LM. We propose an effective method to improve this approach by enabling joint fine-tuning of the planner and the LM. We show that a naive way of approximating the gradient of selecting a label via the straight-through estimator is not effective. Instead, we propose to use the predicted label probabilities as mixing weights to condition the LM on a weighted average of label embeddings in a differentiable manner. This not only enables joint fine-tuning of the planner and the LM, but also allows the LM to draw on the full label distribution predicted by the planner, retaining more information. Our experimental results show consistent improvements in perplexity.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.12492

Country: Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Open-Source Conversational AI with SpeechBrain 1.0

Ravanelli, Mirco, Parcollet, Titouan, Moumen, Adel, de Langen, Sylvain, Subakan, Cem, Plantinga, Peter, Wang, Yingzhi, Mousavi, Pooneh, Della Libera, Luca, Ploujnikov, Artem, Paissan, Francesco, Borra, Davide, Zaiem, Salah, Zhao, Zeyu, Zhang, Shucong, Karakasidis, Georgios, Yeh, Sung-Lin, Champion, Pierre, Rouhe, Aku, Braun, Rudolf, Mai, Florian, Zuluaga-Gomez, Juan, Mousavi, Seyed Mahed, Nautsch, Andreas, Liu, Xuechen, Sagar, Sangeet, Duret, Jarod, Mdhaffar, Salima, Laperriere, Gaelle, Rouvier, Mickael, De Mori, Renato, Esteve, Yannick

arXiv.org Artificial IntelligenceJul-18-2024

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.00463

Country:

Europe (0.94)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Triple-Encoders: Representations That Fire Together, Wire Together

Erker, Justus-Jonas, Mai, Florian, Reimers, Nils, Spanakis, Gerasimos, Gurevych, Iryna

arXiv.org Artificial IntelligenceJul-13-2024

Curved Contrastive Learning, a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder, has recently shown promising results for dialog modeling at far superior efficiency. While high efficiency is achieved through independently encoding utterances, this ignores the importance of contextualization. To overcome this issue, this study introduces triple-encoders, which efficiently compute distributed utterance mixtures Figure 1: Comparison of our Triple Encoder to Henderson from these independently encoded utterances et al. (2020) and Erker et al. (2023). Similar to CCL through a novel hebbian inspired co-occurrence we only need to encode and compute similarity scores learning objective in a self-organizing manner, of the latest utterance. At the same time, we achieve without using any weights, i.e., merely through contextualization through pairwise mean-pooling with local interactions. Empirically, we find that previous encoded utterances combining the advantages triple-encoders lead to a substantial improvement of both previous works. Our analysis shows that the over bi-encoders, and even to better zeroshot co-occurrence training pushes representations that occur generalization than single-vector representation (fire) together closer together, leading to stronger models without requiring re-encoding.

machine learning, natural language, utterance, (17 more...)

arXiv.org Artificial Intelligence

2402.12332

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Learning to Plan for Language Modeling from Unlabeled Data

Cornille, Nathan, Moens, Marie-Francine, Mai, Florian

arXiv.org Artificial IntelligenceMar-31-2024

By training to predict the next token in an unlabeled corpus, large language models learn to perform many tasks without any labeled data. However, their next-token-prediction objective arguably limits their performance in scenarios that require planning, such as writing a coherent article. In this paper, we train a module for planning the future writing process via a self-supervised learning objective. By conditioning on generated latent plans, our model extends the successful language model formula to more abstract planning in an unsupervised way. Empirically, we demonstrate that our method improves language modeling performance in general, particularly with respect to the text structure. Because our framework uses a planner module that is unsupervised and external to the language model, new planner modules can be trained at large scale and easily be shared with the community.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.00614

Country:

Europe (0.93)
North America > United States > Pennsylvania (0.14)
North America > United States > Missouri (0.14)
(3 more...)

Genre: Research Report (0.50)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Mai, Florian, Pannatier, Arnaud, Fehr, Fabio, Chen, Haolin, Marelli, Francois, Fleuret, Francois, Henderson, James

arXiv.org Artificial IntelligenceNov-13-2023

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

hypermixer, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2203.03691

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization

Drakulic, Darko, Michel, Sofia, Mai, Florian, Sors, Arnaud, Andreoli, Jean-Marc

arXiv.org Artificial IntelligenceSep-28-2023

Despite the success of neural-based combinatorial optimization methods for end-to-end heuristic learning, out-of-distribution generalization remains a challenge. In this paper, we present a novel formulation of Combinatorial Optimization Problems (COPs) as Markov Decision Processes (MDPs) that effectively leverages common symmetries of COPs to improve out-of-distribution robustness. Starting from a direct MDP formulation of a constructive method, we introduce a generic way to reduce the state space, based on Bisimulation Quotienting (BQ) in MDPs. Then, for COPs with a recursive nature, we specialize the bisimulation and show how the reduced state exploits the symmetries of these problems and facilitates MDP solving. Our approach is principled and we prove that an optimal policy for the proposed BQ-MDP actually solves the associated COPs. We illustrate our approach on five classical problems: the Euclidean and Asymmetric Traveling Salesman, Capacitated Vehicle Routing, Orienteering and Knapsack Problems. Furthermore, for each problem, we introduce a simple attention-based policy network for the BQ-MDPs, which we train by imitation of (near) optimal solutions of small instances from a single distribution. We obtain new state-of-the-art results for the five COPs on both synthetic and realistic benchmarks. Notably, in contrast to most existing neural approaches, our learned policies show excellent generalization performance to much larger instances than seen during training, without any additional search procedure.

artificial intelligence, efficient neural combinatorial optimization, optimization problem, (2 more...)

arXiv.org Artificial Intelligence

2301.03313

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.80)

Add feedback

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

Mai, Florian, Zuluaga-Gomez, Juan, Parcollet, Titouan, Motlicek, Petr

arXiv.org Artificial IntelligenceMay-29-2023

State-of-the-art ASR systems have achieved promising results by modeling local and global interactions separately. While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are expensive for long input sequences. Here, we address this by extending HyperMixer, an efficient alternative to attention exhibiting linear complexity, to the Conformer architecture for speech recognition, leading to HyperConformer. In particular, multi-head HyperConformer achieves comparable or higher recognition performance while being more efficient than Conformer in terms of inference speed, memory, parameter count, and available training data. HyperConformer achieves a word error rate of 2.9% on Librispeech test-clean with less than 8M neural parameters and a peak memory during training of 5.7GB, hence trainable with accessible hardware. Encoder speed is between 38% on mid-length speech and 56% on long speech faster than an equivalent Conformer. (The HyperConformer recipe is publicly available in: https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transformer/)

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2305.18281

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Mai, Florian, Henderson, James

arXiv.org Artificial IntelligenceOct-13-2021

Text autoencoders are often used for unsupervised conditional text generation by applying mappings in the latent space to change attributes to the desired values. Recently, Mai et al. (2020) proposed Emb2Emb, a method to learn these mappings in the embedding space of an autoencoder. However, their method is restricted to autoencoders with a single-vector embedding, which limits how much information can be retained. We address this issue by extending their method to Bag-of-Vectors Autoencoders (BoV-AEs), which encode the text into a variable-size bag of vectors that grows with the size of the text, as in attention-based models. This allows to encode and reconstruct much longer texts than standard autoencoders. Analogous to conventional autoencoders, we propose regularization techniques that facilitate learning meaningful operations in the latent space. Finally, we adapt for a training scheme that learns to map an input bag to an output bag, including a novel loss function and neural architecture. Our experimental evaluations on unsupervised sentiment transfer and sentence summarization show that our method performs substantially better than a standard autoencoder.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2110.07002

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services > Restaurants (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Plug and Play Autoencoders for Conditional Text Generation

Mai, Florian, Pappas, Nikolaos, Montero, Ivan, Smith, Noah A., Henderson, James

arXiv.org Artificial IntelligenceOct-12-2020

Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a mapping within the autoencoder's embedding space, training embedding-to-embedding (Emb2Emb). This reduces the need for labeled training data for the task and makes the training procedure more efficient. Crucial to the success of this method is a loss term for keeping the mapped embedding on the manifold of the autoencoder and a mapping which is trained to navigate the manifold by learning offset vectors. Evaluations on style transfer tasks both with and without sequence-to-sequence supervision show that our method performs better than or comparable to strong baselines while being up to four times faster.

autoencoder, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2010.02983

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback