AITopics | Chirkova, Nadezhda

Collaborating Authors

Chirkova, Nadezhda

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provence: efficient and robust context pruning for retrieval-augmented generation

Chirkova, Nadezhda, Formal, Thibault, Nikoulina, Vassilina, Clinchant, Stéphane

arXiv.org Artificial IntelligenceJan-27-2025

Retrieval-augmented generation improves various aspects of large language models (LLMs) generation, but suffers from computational overhead caused by long contexts as well as the propagation of irrelevant retrieved information into generated responses. Context pruning deals with both aspects, by removing irrelevant parts of retrieved contexts before LLM generation. Existing context pruning approaches are however limited, and do not provide a universal model that would be both efficient and robust in a wide range of scenarios, e.g., when contexts contain a variable amount of relevant information or vary in length, or when evaluated on various domains. In this work, we close this gap and introduce Provence (Pruning and Reranking Of retrieVEd relevaNt ContExts), an efficient and robust context pruner for Question Answering, which dynamically detects the needed amount of pruning for a given context and can be used out-of-the-box for various domains. The three key ingredients of Provence are formulating the context pruning task as sequence labeling, unifying context pruning capabilities with context reranking, and training on diverse data. Our experimental results show that Provence enables context pruning with negligible to no drop in performance, in various domains and settings, at almost no cost in a standard RAG pipeline. We also conduct a deeper analysis alongside various ablations to provide insights into training context pruners for future work.

large language model, machine learning, provence, (21 more...)

arXiv.org Artificial Intelligence

2501.16214

Country:

Europe (1.00)
Asia (0.68)
North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation

Chirkova, Nadezhda, Nikoulina, Vassilina, Meunier, Jean-Luc, Bérard, Alexandre

arXiv.org Artificial IntelligenceJul-1-2024

We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling, which helps to accommodate a variety of multi-domain data, and allow flexible sharing of parameters between domains, potentially enabling knowledge transfer between similar domains and limiting negative transfer. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE. We also search for a better recipe for robustness of multi-domain systems, highlighting the importance of mixing-in a generic domain, i.e. Paracrawl, and introducing a simple technique, domain randomization.

artificial intelligence, machine translation, natural language, (13 more...)

arXiv.org Artificial Intelligence

2407.01126

Country:

Asia > Middle East > Republic of Türkiye (0.14)
Asia > China (0.14)
North America > United States > Louisiana (0.14)
(4 more...)

Genre: Research Report (0.82)

Industry:

Government (0.46)
Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Retrieval-augmented generation in multilingual settings

Chirkova, Nadezhda, Rau, David, Déjean, Hervé, Formal, Thibault, Clinchant, Stéphane, Nikoulina, Vassilina

arXiv.org Artificial IntelligenceJul-1-2024

Retrieval-augmented generation (RAG) has recently emerged as a promising solution for incorporating up-to-date or domain-specific knowledge into large language models (LLMs) and improving LLM factuality, but is predominantly studied in English-only settings. In this work, we consider RAG in the multilingual setting (mRAG), i.e. with user queries and the datastore in 13 languages, and investigate which components and with which adjustments are needed to build a well-performing mRAG pipeline, that can be used as a strong baseline in future works. Our findings highlight that despite the availability of high-quality off-the-shelf multilingual retrievers and generators, task-specific prompt engineering is needed to enable generation in user languages. Moreover, current evaluation metrics need adjustments for multilingual setting, to account for variations in spelling named entities. The main limitations to be addressed in future works include frequent code-switching in non-Latin alphabet languages, occasional fluency errors, wrong reading of the provided documents, or irrelevant retrieval. We release the code for the resulting mRAG baseline pipeline at https://github.com/naver/bergen.

large language model, machine learning, user language, (18 more...)

arXiv.org Artificial Intelligence

2407.01463

Country:

Asia (0.28)
North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Zero-shot cross-lingual transfer in instruction tuning of large language models

Chirkova, Nadezhda, Nikoulina, Vassilina

arXiv.org Artificial IntelligenceApr-22-2024

Instruction tuning (IT) is widely used to teach pretrained large language models (LLMs) to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We advocate for the importance of evaluating various aspects of model responses in multilingual instruction following and investigate the influence of different model configuration choices. We find that cross-lingual transfer does happen successfully in IT even if all stages of model training are English-centric, but only if multiliguality is taken into account in hyperparameter tuning and with large enough IT data. English-trained LLMs are capable of generating correct-language, comprehensive and helpful responses in other languages, but suffer from low factuality and may occasionally have fluency errors.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.14778

Country:

Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks

Chirkova, Nadezhda, Nikoulina, Vassilina

arXiv.org Artificial IntelligenceApr-22-2024

Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200. We first underline the importance of tuning learning rate used for finetuning, which helps to substantially alleviate the problem of generation in the wrong language. Then, we show that with careful learning rate tuning, the simple full finetuning of the model acts as a very strong baseline and alternative approaches bring only marginal improvements. Finally, we find that mBART performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. Our final zero-shot models reach the performance of the approach based on data translation which is usually considered as an upper baseline for zero-shot cross-lingual transfer in generation.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.12279

Country:

Europe > Spain (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Empirical study of pretrained multilingual language models for zero-shot cross-lingual generation

Chirkova, Nadezhda, Liang, Sheng, Nikoulina, Vassilina

arXiv.org Artificial IntelligenceNov-15-2023

Zero-shot cross-lingual generation assumes finetuning the multilingual pretrained language model (mPLM) on a generation task in one language and then using it to make predictions for this task in other languages. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, and compare various approaches proposed in the literature in a unified setting. We first underline the importance of tuning learning rate used for finetuning, which helps to substantially alleviate the problem of generation in the wrong language. Then, we show that with careful learning rate tuning, the simple full finetuning of the model acts as a very strong baseline; other competitive approaches include parameter-efficient tuning with adapters and training on several source languages. Finally, we find that mBART performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.09917

Country:

Europe > Spain (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Add feedback

CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

Chirkova, Nadezhda, Troshin, Sergey

arXiv.org Artificial IntelligenceAug-1-2023

Recent works have widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, namely the effect of different subtokenization options, and aims at identifying most effective and length-efficient subtokenizations, taking into account code specifics. We propose subtokenziation that reduces average length by 17% without downstream performance drop, and show that a carefully chosen subtokenization may improve quality by 0.5-2%, possibly with some length increase. With the inspiration from the success of large language model (LM) pretraining in natural language processing (NLP), BERT-like models have been widely adopted for source code processing (Feng et al., 2020; Kanade et al., 2020), as code has a similar discrete sequential structure to natural text. Being trained on huge source code corpora in a self-supervised manner, large LMs often substantially outperform domain-specific models developed purposely for applied tasks, especially in the tasks with limited parallel / labelled data (Ahmad et al., 2021a). These tasks include fixing code bugs, generating text from code and vice versa, or translating code between programming languages. Recent works advanced large LM pretraining on source code in two main directions. Second, a range of code-specific self-supervised pretraining tasks were proposed to enrich the classic masked language modeling (MLM) objective, e. g. GraphCodeBERT (Guo et al., 2021) predicts data flow connections during pretraining (one variable is computed from another variable), and CodeT5 (Wang et al., 2021b) and DOBF (Roziere et al., 2021) use a variable naming objective. This work is devoted to investigating one more important component, subtokenization, which is usually not paid much attention when pretraining large LMs on source code. Though this process is often referred to as tokenization, we call it subtokenization, to underline its smaller granularity.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.00683

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Should you marginalize over possible tokenizations?

Chirkova, Nadezhda, Kruszewski, Germán, Rozen, Jos, Dymetman, Marc

arXiv.org Artificial IntelligenceJun-30-2023

Autoregressive language models (LMs) map token sequences to probabilities. The usual practice for computing the probability of any character string (e.g. English sentences) is to first transform it into a sequence of tokens that is scored by the model. However, there are exponentially many token sequences that represent any given string. To truly compute the probability of a string one should marginalize over all tokenizations, which is typically intractable. Here, we analyze whether the practice of ignoring the marginalization is justified. To this end, we devise an importance-sampling-based algorithm that allows us to compute estimates of the marginal probabilities and compare them to the default procedure in a range of state-of-the-art models and datasets. Our results show that the gap in log-likelihood is no larger than 0.5% in most cases, but that it becomes more pronounced for data with long complex words.

artificial intelligence, natural language, tokenization, (17 more...)

arXiv.org Artificial Intelligence

2306.17757

Country: Europe > Italy (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Parameter-Efficient Finetuning of Transformers for Source Code

Ayupov, Shamil, Chirkova, Nadezhda

arXiv.org Artificial IntelligenceDec-12-2022

Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but may be too large to be deployed. As software development tools often incorporate modules for various purposes which may potentially use a single instance of the pretrained model, it appears relevant to utilize parameter-efficient fine-tuning for the pretrained models of code. In this work, we test two widely used approaches, adapters and LoRA, which were initially tested on NLP tasks, on four code-processing tasks. We find that though the efficient fine-tuning approaches may achieve comparable or higher performance than the standard, full, fine-tuning in code understanding tasks, they underperform full fine-tuning in code-generative tasks. These results underline the importance of testing efficient fine-tuning approaches on other domains than NLP and motivate future research in efficient fine-tuning for source code.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.05901

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Software Engineering (0.93)

Add feedback

On the Memorization Properties of Contrastive Learning

Sadrtdinov, Ildus, Chirkova, Nadezhda, Lobacheva, Ekaterina

arXiv.org Machine LearningJul-21-2021

However, data labeling is often time-consuming and costly, as it involves human expertise. Thus, it is common for computer vision to pretrain DNNs vate improvements to DNN training approaches. A pioneer on some large labeled dataset, e. g. ImageNet (Russakovsky work of Zhang et al. (2017) showed that the capacity of et al., 2015), and then to fine-tune the model to a specific modern DNNs is sufficient to fit perfectly even randomly downstream task. The self-supervised learning paradigm labeled data. According to classic learning theory, such a provides a human labeling-free alternative to the supervised huge capacity should lead to catastrophic overfitting, however, pretraining: recently developed contrastive self-supervised recent works (Nakkiran et al., 2020) show that in methods show results, comparable to ImageNet pretraining practice increasing DNN capacity further improves generalization.

augmentation, inductive learning, neural network, (18 more...)

arXiv.org Machine Learning

2107.10143

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback