AITopics | Boizard, Nicolas

Collaborating Authors

Boizard, Nicolas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EuroBERT: Scaling Multilingual Encoders for European Languages

Boizard, Nicolas, Gisserot-Boukhlef, Hippolyte, Alves, Duarte M., Martins, André, Hammal, Ayoub, Corro, Caio, Hudelot, Céline, Malherbe, Emmanuel, Malaboeuf, Etienne, Jourdan, Fanny, Hautreux, Gabriel, Alves, João, El-Haddad, Kevin, Faysse, Manuel, Peyrard, Maxime, Guerreiro, Nuno M., Fernandes, Patrick, Rei, Ricardo, Colombo, Pierre

arXiv.org Artificial IntelligenceMar-7-2025

Many important tasks in Natural Language Processing (NLP), including information retrieval, classification, or regression, are built upon general-purpose vector representations. These representations are traditionally obtained from bidirectional encoder models, which aggregate information from the left and right contexts of each token (Devlin et al., 2019; Conneau et al., 2020; He et al., 2023). In contrast, recent advances in generative modeling have shifted the research community's attention towards unidirectional architectures (Bai et al., 2023; Llama Team, 2024; OLMo et al., 2025). Notably, these efforts have identified several key performance drivers that span architectural advances, data improvements, and increased scale. Yet, despite no apparent barrier to transferring these insights to bidirectional architectures, little effort has been devoted towards this objective, forcing practitioners to depend on outdated models. In this paper, we introduce a refreshed recipe for training general-purpose multilingual encoders, resulting in the EuroBERT family. Drawing inspiration from recent progress in decoder models, our models feature an updated architecture ( 2.1), and are trained on a 5T-token multilingual dataset, covering widely spoken European and global languages,

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.055

Country:

Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

Boizard, Nicolas, Haddad, Kevin El, Hudelot, Céline, Colombo, Pierre

arXiv.org Artificial IntelligenceFeb-20-2024

Deploying large language models (LLMs) of several billion parameters can be impractical in most industrial use cases due to constraints such as cost, latency limitations, and hardware accessibility. Knowledge distillation (KD) offers a solution by compressing knowledge from resource-intensive large models to smaller ones. Various strategies exist, some relying on the text generated by the teacher model and optionally utilizing his logits to enhance learning. However, these methods based on logits often require both teacher and student models to share the same tokenizer, limiting their applicability across different LLM families. In this paper, we introduce Universal Logit Distillation (ULD) loss, grounded in optimal transport, to address this limitation. Our experimental results demonstrate the effectiveness of ULD loss in enabling distillation across models with different architectures and tokenizers, paving the way to a more widespread use of distillation techniques.

large language model, machine learning, uld loss, (18 more...)

arXiv.org Artificial Intelligence

2402.1203

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (1.00)
Education (1.00)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

CroissantLLM: A Truly Bilingual French-English Language Model

Faysse, Manuel, Fernandes, Patrick, Guerreiro, Nuno M., Loison, António, Alves, Duarte M., Corro, Caio, Boizard, Nicolas, Alves, João, Rei, Ricardo, Martins, Pedro H., Casademunt, Antoni Bigata, Yvon, François, Martins, André F. T., Viaud, Gautier, Hudelot, Céline, Colombo, Pierre

arXiv.org Artificial IntelligenceFeb-2-2024

We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware. To that end, we pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio, a custom tokenizer, and bilingual finetuning datasets. We release the training dataset, notably containing a French split with manually curated, high-quality, and varied data sources. To assess performance outside of English, we craft a novel benchmark, FrenchBench, consisting of an array of classification and generation tasks, covering various orthogonal aspects of model performance in the French Language. Additionally, rooted in transparency and to foster further Large Language Model research, we release codebases, and dozens of checkpoints across various model sizes, training data distributions, and training steps, as well as fine-tuned Chat models, and strong translation models. We evaluate our model through the FMTI framework, and validate 81 % of the transparency criteria, far beyond the scores of even most open initiatives. This work enriches the NLP landscape, breaking away from previous English-centric work in order to strengthen our understanding of multilinguality in language models.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2402.00786

Country:

Europe > France (0.68)
Africa (0.67)
Europe > Portugal > Lisbon > Lisbon (0.14)
(5 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Deep learning-based stereo camera multi-video synchronization

Boizard, Nicolas, Haddad, Kevin El, Ravet, Thierry, Cresson, François, Dutoit, Thierry

arXiv.org Artificial IntelligenceMar-22-2023

Currently, the of twenty frames captured at the same frame rate by two most accurate stereo systems are based on specific hardware different cameras with a distance between their optical centres solutions such as stereo cameras. The synchronization of limited to a few centimeters and oriented towards the same video streams within these cameras is therefore achieved using direction, a first module computes one by one the correspondence electronic systems that increase the cost, the weight, limit the scores between each frame of the two sequences; then, flexibility of the systems and require more space. Replacing when all the correspondence scores are computed, a second these electronic solutions with robust software would make module takes as input all these scores to estimate the average these systems more flexible, allowing them to be used with delay between the sequences.

artificial intelligence, machine learning, synchronization, (17 more...)

arXiv.org Artificial Intelligence

2303.12916

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback