AITopics | Cerisara, Christophe

Collaborating Authors

Cerisara, Christophe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation

Gouvert, Olivier, Hunter, Julie, Louradour, Jérôme, Cerisara, Christophe, Dufraisse, Evan, Sy, Yaya, Rivière, Laura, Lorré, Jean-Pierre, community, OpenLLM-France

arXiv.org Artificial IntelligenceMar-15-2025

We present both the Lucie Training Dataset and the Lucie-7B foundation model. The Lucie Training Dataset is a multilingual collection of textual corpora centered around French and designed to offset anglo-centric biases found in many datasets for large language model pretraining. Its French data is pulled not only from traditional web sources, but also from French cultural heritage documents, filling an important gap in modern datasets. Beyond French, which makes up the largest share of the data, we added documents to support several other European languages, including English, Spanish, German, and Italian. Apart from its value as a resource for French language and culture, an important feature of this dataset is that it prioritizes data rights by minimizing copyrighted material. In addition, building on the philosophy of past open projects, it is redistributed in the form used for training and its processing is described on Hugging Face and GitHub. The Lucie-7B foundation model is trained on equal amounts of data in French and English -- roughly 33% each -- in an effort to better represent cultural aspects of French-speaking communities. We also describe two instruction fine-tuned models, Lucie-7B-Instruct-v1.1 and Lucie-7B-Instruct-human-data, which we release as demonstrations of Lucie-7B in use. These models achieve promising results compared to state-of-the-art models, demonstrating that an open approach prioritizing data rights can still deliver strong performance. We see these models as an initial step toward developing more performant, aligned models in the near future. Model weights for Lucie-7B and the Lucie instruct models, along with intermediate checkpoints for the former, are published on Hugging Face, while model training and data preparation code is available on GitHub. This makes Lucie-7B one of the first OSI compliant language models according to the new OSI definition.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.12294

Country:

Europe > France (1.00)
Asia (0.67)
North America > United States > Hawaii (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Education (1.00)
Law > Intellectual Property & Technology Law (0.93)
Energy (0.67)
Government > Regional Government > Europe Government > France Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Sy, Yaya, Cerisara, Christophe, Illina, Irina

arXiv.org Artificial IntelligenceDec-28-2024

Current LLM structured pruning methods typically involve two steps: (1) compression with calibration data and (2) costly continued pretraining on billions of tokens to recover lost performance. This second step is necessary as the first significantly impacts model accuracy. Prior research suggests pretrained Transformer weights aren't inherently low-rank, unlike their activations, which may explain this drop. Based on this observation, we propose Lillama, a compression method that locally distills activations with low-rank weights. Using SVD for initialization and a joint loss combining teacher and student activations, we accelerate convergence and reduce memory use with local gradient updates. Lillama compresses Mixtral-8x7B within minutes on a single A100 GPU, removing 10 billion parameters while retaining over 95% of its original performance. Phi-2 3B can be compressed by 40% with just 13 million calibration tokens, resulting in a small model that competes with recent models of similar size. The method generalizes well to non-transformer architectures, compressing Mamba-3B by 20% while maintaining 99% performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.16719

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Improving Quotation Attribution with Fictional Character Embeddings

Michel, Gaspard, Epure, Elena V., Hennequin, Romain, Cerisara, Christophe

arXiv.org Artificial IntelligenceJun-17-2024

Humans naturally attribute utterances of direct speech to their speaker in literary works. When attributing quotes, we process contextual information but also access mental representations of characters that we build and revise throughout the narrative. Recent methods to automatically attribute such utterances have explored simulating human logic with deterministic rules or learning new implicit rules with neural networks when processing contextual information. However, these systems inherently lack \textit{character} representations, which often leads to errors on more challenging examples of attribution: anaphoric and implicit quotes. In this work, we propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global information of characters. To build these embeddings, we create DramaCV, a corpus of English drama plays from the 15th to 20th century focused on Character Verification (CV), a task similar to Authorship Verification (AV), that aims at analyzing fictional characters. We train a model similar to the recently proposed AV model, Universal Authorship Representation (UAR), on this dataset, showing that it outperforms concurrent methods of characters embeddings on the CV task and generalizes better to literary novels. Then, through an extensive evaluation on 22 novels, we show that combining BookNLP's contextual information with our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes, reaching state-of-the-art performance. Code and data will be made publicly available.

attribution, machine learning, simulation of human behavior, (19 more...)

arXiv.org Artificial Intelligence

2406.11368

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.54)

Add feedback

A Realistic Evaluation of LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3

Michel, Gaspard, Epure, Elena V., Hennequin, Romain, Cerisara, Christophe

arXiv.org Artificial IntelligenceJun-17-2024

Large Language Models (LLMs) zero-shot and few-shot performance are subject to memorization and data contamination, complicating the assessment of their validity. In literary tasks, the performance of LLMs is often correlated to the degree of book memorization. In this work, we carry out a realistic evaluation of LLMs for quotation attribution in novels, taking the instruction fined-tuned version of Llama3 as an example. We design a task-specific memorization measure and use it to show that Llama3's ability to perform quotation attribution is positively correlated to the novel degree of memorization. However, Llama3 still performs impressively well on books it has not memorized nor seen. Data and code will be made publicly available.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.1138

Country:

Europe > France (0.46)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution

Michel, Gaspard, Epure, Elena V., Hennequin, Romain, Cerisara, Christophe

arXiv.org Artificial IntelligenceJan-30-2024

Recent approaches to automatically detect the speaker of an utterance of direct speech often disregard general information about characters in favor of local information found in the context, such as surrounding mentions of entities. In this work, we explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models in a large corpus of English Figure 1: Example of quotation attribution on an excerpt novels (the Project Dialogism Novel Corpus). of Pride and Prejudice by Jane Austen (1813). Results suggest that the combination of stylistic Underlined text are identified mentions, and arrows link and topical information captured in some quotes to their relevant entity mention (solid arrows are of these models accurately distinguish characters explicit references and dashed arrows are anaphoric references).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.16968

Country:

Europe > France (0.46)
North America > Canada (0.28)
North America > United States > Minnesota (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Multi-lingual Dialogue Act Recognition with Deep Learning Methods

Martínek, Jiří, Král, Pavel, Lenc, Ladislav, Cerisara, Christophe

arXiv.org Artificial IntelligenceApr-11-2019

This paper deals with multi-lingual dialogue act (DA) recognition. The proposed approaches are based on deep neural networks and use word2vec embeddings for word representation. Two multi-lingual models are proposed for this task. The first approach uses one general model trained on the embeddings from all available languages. The second method trains the model on a single pivot language and a linear transformation method is used to project other languages onto the pivot language. The popular convolutional neural network and LSTM architectures with different set-ups are used as classifiers. To the best of our knowledge this is the first attempt at multi-lingual DA recognition using neural networks. The multi-lingual models are validated experimentally on two languages from the Verbmobil corpus.

deep learning, neural network, recognition, (21 more...)

arXiv.org Artificial Intelligence

1904.05606

Country:

Europe (0.94)
North America > United States > New Mexico (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Do Convolutional Networks Need to Be Deep for Text Classification ?

Le, Hoa T. (Laboratory LORIA) | Cerisara, Christophe (Laboratory LORIA) | Denis, Alexandre (SESAMm)

AAAI ConferencesApr-6-2018

We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9%) and Yelp Full (64.9%).

convolutional network, text classification

AAAI Conferences

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.80)

Add feedback