AITopics | Štefánik, Michal

Collaborating Authors

Štefánik, Michal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Negation: A Pink Elephant in the Large Language Models' Room?

Vrabcová, Tereza, Kadlčík, Marek, Sojka, Petr, Štefánik, Michal, Spiegel, Michal

arXiv.org Artificial IntelligenceMar-28-2025

Negations are key to determining sentence meaning, making them essential for logical reasoning. Despite their importance, negations pose a substantial challenge for large language models (LLMs) and remain underexplored. We construct two multilingual natural language inference (NLI) datasets with \textit{paired} examples differing in negation. We investigate how model size and language impact its ability to handle negation correctly by evaluating popular LLMs. Contrary to previous work, we show that increasing the model size consistently improves the models' ability to handle negations. Furthermore, we find that both the models' reasoning accuracy and robustness to negation are language-dependent and that the length and explicitness of the premise have a greater impact on robustness than language. Our datasets can facilitate further research and improvements of language model reasoning in multilingual settings.

large language model, natural language, negation, (14 more...)

arXiv.org Artificial Intelligence

2503.22395

Country:

Asia (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Attend or Perish: Benchmarking Attention in Algorithmic Reasoning

Spiegel, Michal, Štefánik, Michal, Kadlčík, Marek, Kuchař, Josef

arXiv.org Artificial IntelligenceFeb-28-2025

Can transformers learn to perform algorithmic tasks reliably across previously unseen input/output domains? While pre-trained language models show solid accuracy on benchmarks incorporating algorithmic reasoning, assessing the reliability of these results necessitates an ability to cleanse models' functional capabilities from memorization. In this paper, we propose an algorithmic benchmark comprising six tasks of infinite input domains where we can also disentangle and trace the correct, robust algorithm necessary for the task. This allows us to assess (i) models' ability to extrapolate to unseen types of inputs, including new lengths, value ranges or input domains, but also (ii) to assess the robustness of the functional mechanism in recent models through the lens of their attention maps. We make the implementation of all our tasks and interoperability methods publicly available at https://github.com/michalspiegel/AttentionSpan .

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.01909

Country: Europe > Middle East > Malta (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Self-training Language Models for Arithmetic Reasoning

Kadlčík, Marek, Štefánik, Michal

arXiv.org Artificial IntelligenceJul-11-2024

Language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving the capabilities of language models without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). We find that models can substantially improve in both single-round (offline) and online self-training. In the offline setting, supervised methods are able to deliver gains comparable to preference optimization, but in online self-training, preference optimization shows to largely outperform supervised training thanks to superior stability and robustness on unseen types of problems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.084

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems

Kadlčík, Marek, Štefánik, Michal, Sotolář, Ondřej, Martinek, Vlastimil

arXiv.org Artificial IntelligenceOct-23-2023

Despite outstanding performance in many tasks, language models are notoriously inclined to make factual errors in tasks requiring arithmetic computation. We address this deficiency by creating Calc-X, a collection of datasets that demonstrates the appropriate use of a calculator in reasoning chains. Calc-X is suitable for teaching language models to offload computations to a symbolic system. We survey and unify several existing chain-of-thought datasets into a proposed format, resulting in a standard collection of over 300,000 samples requiring arithmetic reasoning. Finally, we use the new Calc-X collection to train open-source calculator-using models we call Calcformers and show that these models approximately double the accuracy of generating correct results compared to vanilla language model baselines. We make all Calc-X datasets, source code and Calcformers models publicly available.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.15017

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)

Add feedback

Can In-context Learners Learn a Reasoning Concept from Demonstrations?

Štefánik, Michal, Kadlčík, Marek

arXiv.org Artificial IntelligenceJul-19-2023

Language models exhibit an emergent ability to learn a new task from a small number of input-output demonstrations. However, recent work shows that in-context learners largely rely on their pre-trained knowledge, such as the sentiment of the labels, instead of learning new associations from the input. We argue that the commonly-used few-shot evaluation using a random selection of in-context demonstrations can not disentangle models' reliance on such biases, as most of the randomly-selected demonstrations do not present relations informative for prediction beyond exposing the task's input-output distribution. Therefore, to evaluate models' in-context learning ability independent of models' memory, we introduce a Concept-sharing few-shot learning method choosing the demonstrations that share an underlying concept with the predicted sample. We extract a set of such concepts from available human explanations and measure how much models can benefit from presenting these concepts in few-shot demonstrations. We find that most of the recent in-context learners can not consistently benefit from the demonstrated concepts, irrespective of the model size. However, we note that T0 models are more sensitive to exhibited concepts, benefiting from concept-sharing demonstrations in 7 out of 8 evaluation scenarios.

demonstration, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.01692

Country:

Europe > Belgium (0.14)
Europe > Spain (0.14)
Europe > France (0.14)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts

Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, Horák, Aleš

arXiv.org Artificial IntelligenceJun-6-2023

Although pre-trained named entity recognition (NER) models are highly accurate on modern corpora, they underperform on historical texts due to differences in language OCR errors. In this work, we develop a new NER corpus of 3.6M sentences from late medieval charters written mainly in Czech, Latin, and German. We show that we can start with a list of known historical figures and locations and an unannotated corpus of historical texts, and use information retrieval techniques to automatically bootstrap a NER-annotated corpus. Using our corpus, we train a NER model that achieves entity-level Precision of 72.81-93.98% with 58.14-81.77% Recall on a manually-annotated test dataset. Furthermore, we show that using a weighted loss function helps to combat class imbalance in token classification tasks. To make it easy for others to reproduce and build upon our work, we publicly release our corpus, models, and experimental code.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.16718

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Soft Alignment Objectives for Robust Adaptation of Language Generation

Štefánik, Michal, Kadlčík, Marek, Sojka, Petr

arXiv.org Artificial IntelligenceMay-26-2023

Domain adaptation allows generative language models to address specific flaws caused by the domain shift of their application. However, the traditional adaptation by further training on in-domain data rapidly weakens the model's ability to generalize to other domains, making the open-ended deployments of the adapted models prone to errors. This work introduces novel training objectives built upon a semantic similarity of the predicted tokens to the reference. Our results show that (1) avoiding the common assumption of a single correct prediction by constructing the training target from tokens' semantic similarity can mitigate catastrophic forgetting during domain adaptation, while (2) preserving the quality of the adaptation, (3) with negligible additions to compute costs. In the broader context, the objectives grounded in a continuous token similarity pioneer the exploration of the middle ground between the efficient but na\"{\i}ve exact-match token-level objectives and expressive but computationally- and resource-intensive sequential objectives.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.1655

Country:

Europe (1.00)
Asia > Middle East (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.64)

Add feedback

Concept-aware Training Improves In-context Learning Ability of Language Models

Štefánik, Michal, Kadlčík, Marek

arXiv.org Artificial IntelligenceMay-23-2023

Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability, manifested in the LMs' ability to modulate their function by a task described in a natural language input. Previous work curating these models assumes that ICL emerges from vast over-parametrization or the scale of multi-task training. However, a complementary branch of recent theoretical work attributes ICL emergence to specific properties of training data and creates functional in-context learners in small-scale, synthetic settings. Inspired by recent findings on data properties driving the emergence of ICL, we propose a method to create LMs able to better utilize the in-context information, by constructing training scenarios where it is beneficial for the LM to capture the analogical reasoning concepts. We measure that data sampling of Concept-aware Training (CoAT) consistently improves models' reasoning ability. As a result, the in-context learners trained with CoAT on only two datasets of a single (QA) task perform comparably to larger models trained on 1600+ tasks.

demonstration, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.13775

Country:

North America > United States (0.46)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models

Mikula, Lukáš, Štefánik, Michal, Petrovič, Marek, Sojka, Petr

arXiv.org Artificial IntelligenceMay-11-2023

While the Large Language Models (LLMs) dominate a majority of language understanding tasks, previous work shows that some of these results are supported by modelling spurious correlations of training datasets. Authors commonly assess model robustness by evaluating their models on out-of-distribution (OOD) datasets of the same task, but these datasets might share the bias of the training dataset. We propose a simple method for measuring a scale of models' reliance on any identified spurious feature and assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA). We find that the reported OOD gains of debiasing methods can not be explained by mitigated reliance on biased features, suggesting that biases are shared among QA datasets. We further evidence this by measuring that performance of OOD models depends on bias features comparably to the ID model, motivating future work to refine the reports of LLMs' robustness to a level of known spurious features.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2305.06841

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)

Add feedback

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Štefánik, Michal, Kadlčík, Marek, Gramacki, Piotr, Sojka, Petr

arXiv.org Artificial IntelligenceApr-4-2023

Despite the rapid recent progress in creating accurate and compact in-context learners, most recent work focuses on in-context learning (ICL) for tasks in English. However, the ability to interact with users of languages outside English presents a great potential for broadening the applicability of language technologies to non-English speakers. In this work, we collect the infrastructure necessary for training and evaluation of ICL in a selection of Slavic languages: Czech, Polish, and Russian. We link a diverse set of datasets and cast these into a unified instructional format through a set of transformations and newly-crafted templates written purely in target languages. Using the newly-curated dataset, we evaluate a set of the most recent in-context learners and compare their results to the supervised baselines. Finally, we train, evaluate and publish a set of in-context learning models that we train on the collected resources and compare their performance to previous work. We find that ICL models tuned in English are also able to learn some tasks from non-English contexts, but multilingual instruction fine-tuning consistently improves the ICL ability. We also find that the massive multitask training can be outperformed by single-task training in the target language, uncovering the potential for specializing in-context learners to the language(s) of their application.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.01922

Country:

Asia > Middle East (0.46)
North America > United States (0.28)
Europe > Czechia (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Add feedback