Not enough data to create a plot.
Try a different view from the menu above.
Malherbe, Emmanuel
EuroBERT: Scaling Multilingual Encoders for European Languages
Boizard, Nicolas, Gisserot-Boukhlef, Hippolyte, Alves, Duarte M., Martins, André, Hammal, Ayoub, Corro, Caio, Hudelot, Céline, Malherbe, Emmanuel, Malaboeuf, Etienne, Jourdan, Fanny, Hautreux, Gabriel, Alves, João, El-Haddad, Kevin, Faysse, Manuel, Peyrard, Maxime, Guerreiro, Nuno M., Fernandes, Patrick, Rei, Ricardo, Colombo, Pierre
Many important tasks in Natural Language Processing (NLP), including information retrieval, classification, or regression, are built upon general-purpose vector representations. These representations are traditionally obtained from bidirectional encoder models, which aggregate information from the left and right contexts of each token (Devlin et al., 2019; Conneau et al., 2020; He et al., 2023). In contrast, recent advances in generative modeling have shifted the research community's attention towards unidirectional architectures (Bai et al., 2023; Llama Team, 2024; OLMo et al., 2025). Notably, these efforts have identified several key performance drivers that span architectural advances, data improvements, and increased scale. Yet, despite no apparent barrier to transferring these insights to bidirectional architectures, little effort has been devoted towards this objective, forcing practitioners to depend on outdated models. In this paper, we introduce a refreshed recipe for training general-purpose multilingual encoders, resulting in the EuroBERT family. Drawing inspiration from recent progress in decoder models, our models feature an updated architecture ( 2.1), and are trained on a 5T-token multilingual dataset, covering widely spoken European and global languages,
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Gisserot-Boukhlef, Hippolyte, Rei, Ricardo, Malherbe, Emmanuel, Hudelot, Céline, Colombo, Pierre, Guerreiro, Nuno M.
Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics through quality-informed decoding strategies, achieving better results than likelihood-based methods. With the rise of Large Language Models (LLMs), preference-based alignment techniques have gained attention for their potential to enhance translation quality by optimizing model weights directly on preferences induced by quality estimators. This study focuses on Contrastive Preference Optimization (CPO) and conducts extensive experiments to evaluate the impact of preference-based alignment on translation quality. Our findings indicate that while CPO consistently outperforms Supervised Fine-Tuning (SFT) on high-quality data with regard to the alignment metric, it may lead to instability across downstream evaluation metrics, particularly between neural and lexical ones. Additionally, we demonstrate that relying solely on the base model for generating candidate translations achieves performance comparable to using multiple external systems, while ensuring better consistency across downstream metrics.
Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism
Gisserot-Boukhlef, Hippolyte, Faysse, Manuel, Malherbe, Emmanuel, Hudelot, Céline, Colombo, Pierre
Neural Information Retrieval (NIR) has significantly improved upon heuristic-based IR systems. Yet, failures remain frequent, the models used often being unable to retrieve documents relevant to the user's query. We address this challenge by proposing a lightweight abstention mechanism tailored for real-world constraints, with particular emphasis placed on the reranking phase. We introduce a protocol for evaluating abstention strategies in a black-box scenario, demonstrating their efficacy, and propose a simple yet effective data-driven mechanism. We provide open-source code for experiment replication and abstention implementation, fostering wider adoption and application in diverse contexts.
Theoretical and experimental study of SMOTE: limitations and comparisons of rebalancing strategies
Sakho, Abdoulaye, Scornet, Erwan, Malherbe, Emmanuel
Imbalanced data sets are a typical problem encountered practically in several applications (He and Garcia, 2009), such as fraud detection (Hassan and Abraham, 2016), medical diagnosis (Khalilia et al., 2011) and even churn detection (Nguyen and Duong, 2021). In presence of imbalanced data sets, most machine learning algorithms have a tendency to predict the majority class, therefore leading to biased predictions. Several strategies have been developed in order to handle this issue, as explained by Krawczyk (2016) and Ramyachitra and Manikandan (2014). All of these strategies can be split into two categories: the model-level approaches and the data-level approaches. Model-level approaches deal with this problem by acting directly on machine learning algorithms.