AITopics | Fajcik, Martin

Plotting

Fajcik, Martin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BenCzechMark : A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

Fajcik, Martin, Docekal, Martin, Dolezal, Jan, Ondrej, Karel, Beneš, Karel, Kapsa, Jan, Smrz, Pavel, Polok, Alexander, Hradis, Michal, Neverilova, Zuzana, Horak, Ales, Sabol, Radoslav, Stefanik, Michal, Jirkovsky, Adam, Adamczyk, David, Hyner, Petr, Hula, Jan, Kydlicek, Hynek

arXiv.org Artificial IntelligenceDec-23-2024

We present BenCzechMark (BCM), the first comprehensive Czech language benchmark designed for large language models, offering diverse tasks, multiple task formats, and multiple evaluation metrics. Its scoring system is grounded in statistical significance theory and uses aggregation across tasks inspired by social preference theory. Our benchmark encompasses 50 challenging tasks, with corresponding test datasets, primarily in native Czech, with 11 newly collected ones. These tasks span 8 categories and cover diverse domains, including historical Czech news, essays from pupils or language learners, and spoken word. Furthermore, we collect and clean BUT-Large Czech Collection, the largest publicly available clean Czech language corpus, and use it for (i) contamination analysis, (ii) continuous pretraining of the first Czech-centric 7B language model, with Czech-specific tokenization. We use our model as a baseline for comparison with publicly available multilingual models. Lastly, we release and maintain a leaderboard, with existing 44 model submissions, where new model submissions can be made at https://huggingface.co/spaces/CZLC/BenCzechMark.

large language model, machine learning, meta-llama-3, (18 more...)

arXiv.org Artificial Intelligence

2412.17933

Country:

Asia (1.00)
Europe > Czechia (0.67)
North America (0.67)
Europe > Middle East > Malta (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry:

Media (1.00)
Government (1.00)
Education (1.00)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Add feedback

A Comparative Study of Text Retrieval Models on DaReCzech

Stetina, Jakub, Fajcik, Martin, Stefanik, Michal, Hradis, Michal

arXiv.org Artificial IntelligenceDec-20-2024

This article presents a comprehensive evaluation of 7 off-the-shelf document retrieval models: Splade, Plaid, Plaid-X, SimCSE, Contriever, OpenAI ADA and Gemma2 chosen to determine their performance on the Czech retrieval dataset DaReCzech. The primary objective of our experiments is to estimate the quality of modern retrieval approaches in the Czech language. Our analyses include retrieval quality, speed, and memory footprint. Secondly, we analyze whether it is better to use the model directly in Czech text, or to use machine translation into English, followed by retrieval in English. Our experiments identify the most effective option for Czech information retrieval. The findings revealed notable performance differences among the models, with Gemma22 achieving the highest precision and recall, while Contriever performing poorly. Conclusively, SPLADE and PLAID models offered a balance of efficiency and performance.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.12921

Country: Europe > Czechia (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources

Docekal, Martin, Fajcik, Martin, Smrz, Pavel

arXiv.org Artificial IntelligenceMay-3-2024

This paper introduces OARelatedWork, the first large-scale multi-document summarization dataset for related work generation containing whole related work sections and full-texts of cited papers. The dataset includes 94 450 papers and 5 824 689 unique referenced papers. It was designed for the task of automatically generating related work to shift the field toward generating entire related work sections from all available content instead of generating parts of related work sections from abstracts only, which is the current mainstream in this field for abstractive approaches. We show that the estimated upper bound for extractive summarization increases by 217% in the ROUGE-2 score, when using full content instead of abstracts. Furthermore, we show the benefits of full content data on naive, oracle, traditional, and transformer-based baselines. Long outputs, such as related work sections, pose challenges for automatic evaluation metrics like BERTScore due to their limited input length. We tackle this issue by proposing and evaluating a meta-metric using BERTScore. Despite operating on smaller blocks, we show this meta-metric correlates with human judgment, comparably to the original BERTScore.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.0193

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

Fajcik, Martin, Motlicek, Petr, Smrz, Pavel

arXiv.org Artificial IntelligenceAug-7-2023

We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidences jointly learns to identify: (i) the relevant evidences to the given claim, (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the final veracity probability is proportional to a linear ensemble of per-evidence relevance probabilities. In this way, the individual contributions of evidences towards the final predicted probability can be identified. In per-evidence relevance probability, our model can further distinguish whether each relevant evidence is supporting (S) or refuting (R) the claim. This allows to quantify how much the S/R probability contributes to the final verdict or to detect disagreeing evidence. Despite its interpretable nature, our system achieves results competitive with state-of-the-art on the FEVER dataset, as compared to typical two-stage system pipelines, while using significantly fewer parameters. It also sets new state-of-the-art on FAVIQ and RealFC datasets. Furthermore, our analysis shows that our model can learn fine-grained relevance cues while using coarse-grained supervision, and we demonstrate it in 2 ways. (i) We show that our model can achieve competitive sentence recall while using only paragraph-level relevance supervision. (ii) Traversing towards the finest granularity of relevance, we show that our model is capable of identifying relevance at the token level. To do this, we present a new benchmark TLR-FEVER focusing on token-level interpretability -- humans annotate tokens in relevant evidences they considered essential when making their judgment. Then we measure how similar are these annotations to the tokens our model is focusing on.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2207.14116

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Pruning the Index Contents for Memory Efficient Open-Domain QA

Fajcik, Martin, Docekal, Martin, Ondrej, Karel, Smrz, Pavel

arXiv.org Artificial IntelligenceFeb-21-2021

This work presents a novel pipeline that demonstrates what is achievable with a combined effort of state-of-the-art approaches, surpassing the 50% exact match on NaturalQuestions and EfficentQA datasets. Specifically, it proposes the novel R2-D2 (Rank twice, reaD twice) pipeline composed of retriever, reranker, extractive reader, generative reader and a simple way to combine them. Furthermore, previous work often comes with a massive index of external documents that scales in the order of tens of GiB. This work presents a simple approach for pruning the contents of a massive index such that the open-domain QA system altogether with index, OS, and library components fits into 6GiB docker image while retaining only 8% of original index contents and losing only 3% EM accuracy.

deep learning, neural network, question answering, (21 more...)

arXiv.org Artificial Intelligence

2102.10697

Country: Europe > Czechia (0.28)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Min, Sewon, Boyd-Graber, Jordan, Alberti, Chris, Chen, Danqi, Choi, Eunsol, Collins, Michael, Guu, Kelvin, Hajishirzi, Hannaneh, Lee, Kenton, Palomaki, Jennimaria, Raffel, Colin, Roberts, Adam, Kwiatkowski, Tom, Lewis, Patrick, Wu, Yuxiang, Küttler, Heinrich, Liu, Linqing, Minervini, Pasquale, Stenetorp, Pontus, Riedel, Sebastian, Yang, Sohee, Seo, Minjoon, Izacard, Gautier, Petroni, Fabio, Hosseini, Lucas, De Cao, Nicola, Grave, Edouard, Yamada, Ikuya, Shimaoka, Sonse, Suzuki, Masatoshi, Miyawaki, Shumpei, Sato, Shun, Takahashi, Ryo, Suzuki, Jun, Fajcik, Martin, Docekal, Martin, Ondrej, Karel, Smrz, Pavel, Cheng, Hao, Shen, Yelong, Liu, Xiaodong, He, Pengcheng, Chen, Weizhu, Gao, Jianfeng, Oguz, Barlas, Chen, Xilun, Karpukhin, Vladimir, Peshterliev, Stan, Okhonko, Dmytro, Schlichtkrull, Michael, Gupta, Sonal, Mehdad, Yashar, Yih, Wen-tau

arXiv.org Artificial IntelligenceDec-31-2020

We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing large, redundant, retrieval corpora or the parameters of large learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA.

prediction, upstream oil & gas, us government, (23 more...)

arXiv.org Artificial Intelligence

2101.00133

Country:

Asia (1.00)
Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Media (0.68)
Leisure & Entertainment > Games (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Rethinking the objectives of extractive question answering

Fajcik, Martin, Jon, Josef, Kesiraju, Santosh, Smrz, Pavel

arXiv.org Artificial IntelligenceAug-28-2020

This paper describes two generally applicable approaches towards the significant improvement of the performance of state-of-the-art extractive question answering (EQA) systems. Firstly, contrary to a common belief, it demonstrates that using the objective with independence assumption for span probability $P(a_s,a_e) = P(a_s)P(a_e)$ of span starting at position $a_s$ and ending at position $a_e$ may have adverse effects. Therefore we propose a new compound objective that models joint probability $P(a_s,a_e)$ directly, while still keeping the objective with independency assumption as an auxiliary objective. Our second approach shows the beneficial effect of distantly semi-supervised shared-normalization objective known from (Clark and Gardner, 2017). We show that normalizing over a set of documents similar to the golden passage, and marginalizing over all ground-truth answer string positions leads to the improvement of results from smaller statistical models. Our results are supported via experiments with three QA models (BidAF, BERT, ALBERT) over six datasets. The proposed approaches do not use any additional data. Our code, analysis, pretrained models, and individual results will be available online.

deep learning, neural network, objective, (22 more...)

arXiv.org Artificial Intelligence

2008.12804

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Government > Space Agency (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation models

Fajcik, Martin, Jon, Josef, Docekal, Martin, Smrz, Pavel

arXiv.org Machine LearningJul-28-2020

This paper describes BUT-FIT's submission at SemEval-2020 Task 5: Modelling Causal Reasoning in Language: Detecting Counterfactuals. The challenge focused on detecting whether a given statement contains a counterfactual (Subtask 1) and extracting both antecedent and consequent parts of the counterfactual from the text (Subtask 2). We experimented with various state-of-the-art language representation models (LRMs). We found RoBERTa LRM to perform the best in both subtasks. We achieved the first place in both exact match and F1 for Subtask 2 and ranked second for Subtask 1.

counterfactual statement, health & medicine, neural network, (17 more...)

arXiv.org Machine Learning

2007.14128

Country: Europe > Czechia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers

Fajcik, Martin, Burget, Lukáš, Smrz, Pavel

arXiv.org Artificial IntelligenceMar-21-2019

This paper describes our system submitted to SemEval 2019 Task 7: RumourEval 2019: Determining Rumour Veracity and Support for Rumours, Subtask A (Gorrell et al., 2019). The challenge focused on classifying whether posts from Twitter and Reddit support, deny, query, or comment a hidden rumour, truthfulness of which is the topic of an underlying discussion thread. We formulate the problem as a stance classification, determining the rumour stance of a post with respect to the previous thread post and the source thread post. The recent BERT architecture was employed to build an end-to-end system which has reached the F1 score of 61.67% on the provided test data. It finished at the 2nd place in the competition, without any hand-crafted features, only 0.2% behind the winner.

deep learning, neural network, proceedings, (14 more...)

arXiv.org Artificial Intelligence

1902.10126

Country:

Europe > Czechia (0.14)
North America (0.14)
Asia (0.14)

Genre: Research Report (0.40)

Industry: Media (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback