AITopics | working note

Collaborating Authors

working note

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

XplaiNLP at CheckThat! 2025: Multilingual Subjectivity Detection with Finetuned Transformers and Prompt-Based Inference with Large Language Models

Sahitaj, Ariana, Li, Jiaao, Neves, Pia Wenzel, Splitt, Fedor, Sahitaj, Premtim, Jakob, Charlott, Solopova, Veronika, Schmitt, Vera

arXiv.org Artificial IntelligenceSep-16-2025

This notebook reports the XplaiNLP submission to the CheckThat! 2025 shared task on multilingual subjectivity detection. We evaluate two approaches: (1) supervised fine-tuning of transformer encoders, EuroBERT, XLM-RoBERTa, and German-BERT, on monolingual and machine-translated training data; and (2) zero-shot prompting using two LLMs: o3-mini for Annotation (rule-based labelling) and gpt-4.1-mini for DoubleDown (contrastive rewriting) and Perspective (comparative reasoning). The Annotation Approach achieves 1st place in the Italian monolingual subtask with an F_1 score of 0.8104, outperforming the baseline of 0.6941. In the Romanian zero-shot setting, the fine-tuned XLM-RoBERTa model obtains an F_1 score of 0.7917, ranking 3rd and exceeding the baseline of 0.6461. The same model also performs reliably in the multilingual task and improves over the baseline in Greek. For German, a German-BERT model fine-tuned on translated training data from typologically related languages yields competitive performance over the baseline. In contrast, performance in the Ukrainian and Polish zero-shot settings falls slightly below the respective baselines, reflecting the challenge of generalization in low-resource cross-lingual scenarios.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.1213

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Overview of BioASQ 2025: The Thirteenth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Nentidis, Anastasios, Katsimpras, Georgios, Krithara, Anastasia, Krallinger, Martin, Rodríguez-Ortega, Miguel, Rodriguez-López, Eduard, Loukachevitch, Natalia, Sakhovskiy, Andrey, Tutubalina, Elena, Dimitriadis, Dimitris, Tsoumakas, Grigorios, Giannakoulas, George, Bekiaridou, Alexandra, Samaras, Athanasios, Di Nunzio, Giorgio Maria, Ferro, Nicola, Marchesin, Stefano, Martinelli, Marco, Silvello, Gianmaria, Paliouras, Georgios

arXiv.org Artificial IntelligenceAug-29-2025

This is an overview of the thirteenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2025. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks, b and Synergy, and four new tasks: a) Task MultiClinSum on multilingual clinical summarization. b) Task BioNNE-L on nested named entity linking in Russian and English. c) Task ELCardioCC on clinical coding in cardiology. d) Task GutBrainIE on gut-brain interplay information extraction. In this edition of BioASQ, 83 competing teams participated with more than 1000 distinct submissions in total for the six different shared tasks of the challenge. Similar to previous editions, several participating systems achieved competitive performance, indicating the continuous advancement of the state-of-the-art in the field.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2508.20554

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Team "better_call_claude": Style Change Detection using a Sequential Sentence Pair Classifier

Schmidt, Gleb, Römisch, Johannes, Halchynska, Mariia, Gorovaia, Svetlana, Yamshchikov, Ivan P.

arXiv.org Artificial IntelligenceAug-4-2025

Style change detection - identifying the points in a document where writing style shifts - remains one of the most important and challenging problems in computational authorship analysis. At PAN 2025, the shared task challenges participants to detect style switches at the most fine-grained level: individual sentences. The task spans three datasets, each designed with controlled and increasing thematic variety within documents. We propose to address this problem by modeling the content of each problem instance - that is, a series of sentences - as a whole, using a Sequential Sentence Pair Classifier (SSPC). The architecture leverages a pre-trained language model (PLM) to obtain representations of individual sentences, which are then fed into a bidirectional LSTM (BiLSTM) to contextualize them within the document. The BiLSTM-produced vectors of adjacent sentences are concatenated and passed to a multi-layer perceptron for prediction per adjacency. Building on the work of previous PAN participants classical text segmentation, the approach is relatively conservative and lightweight. Nevertheless, it proves effective in leveraging contextual information and addressing what is arguably the most challenging aspect of this year's shared task: the notorious problem of "stylistically shallow", short sentences that are prevalent in the proposed benchmark data. Evaluated on the official PAN-2025 test datasets, the model achieves strong macro-F1 scores of 0.923, 0.828, and 0.724 on the EASY, MEDIUM, and HARD data, respectively, outperforming not only the official random baselines but also a much more challenging one: claude-3.7-sonnet's zero-shot performance.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2508.00675

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management

Gasco, Luis, Fabregat, Hermenegildo, García-Sardiña, Laura, Estrella, Paula, Deniz, Daniel, Rodrigo, Alvaro, Zbib, Rabih

arXiv.org Artificial IntelligenceJul-18-2025

Advances in natural language processing and large language models are driving a major transformation in Human Capital Management, with a growing interest in building smart systems based on language technologies for talent acquisition, upskilling strategies, and workforce planning. However, the adoption and progress of these technologies critically depend on the development of reliable and fair models, properly evaluated on public data and open benchmarks, which have so far been unavailable in this domain. To address this gap, we present TalentCLEF 2025, the first evaluation campaign focused on skill and job title intelligence. The lab consists of two tasks: Task A - Multilingual Job Title Matching, covering English, Spanish, German, and Chinese; and Task B - Job Title-Based Skill Prediction, in English. Both corpora were built from real job applications, carefully anonymized, and manually annotated to reflect the complexity and diversity of real-world labor market data, including linguistic variability and gender-marked expressions. The evaluations included monolingual and cross-lingual scenarios and covered the evaluation of gender bias. TalentCLEF attracted 76 registered teams with more than 280 submissions. Most systems relied on information retrieval techniques built with multilingual encoder-based models fine-tuned with contrastive learning, and several of them incorporated large language models for data augmentation or re-ranking. The results show that the training strategies have a larger effect than the size of the model alone. TalentCLEF provides the first public benchmark in this field and encourages the development of robust, fair, and transferable language technologies for the labor market.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.13275

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Banking & Finance > Economy (0.68)
Banking & Finance > Trading (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DS@GT at CheckThat! 2025: Detecting Subjectivity via Transfer-Learning and Corrective Data Augmentation

Heil, Maximilian, Bang, Dionne

arXiv.org Artificial IntelligenceJul-9-2025

This paper presents our submission to Task 1, Subjectivity Detection, of the CheckThat! Lab at CLEF 2025. We investigate the effectiveness of transfer-learning and stylistic data augmentation to improve classification of subjective and objective sentences in English news text. Our approach contrasts fine-tuning of pre-trained encoders and transfer-learning of fine-tuned transformer on related tasks. We also introduce a controlled augmentation pipeline using GPT-4o to generate paraphrases in predefined subjectivity styles. To ensure label and style consistency, we employ the same model to correct and refine the generated samples. Results show that transfer-learning of specified encoders outperforms fine-tuning general-purpose ones, and that carefully curated augmentation significantly enhances model robustness, especially in detecting subjective content. Our official submission placed us $16^{th}$ of 24 participants. Overall, our findings underscore the value of combining encoder specialization with label-consistent augmentation for improved subjectivity detection. Our code is available at https://github.com/dsgt-arc/checkthat-2025-subject.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.06189

Country:

Europe (0.94)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Temporal fine-tuning for early risk detection

Thompson, Horacio, Villatoro-Tello, Esaú, Montes-y-Gómez, Manuel, Errecalde, Marcelo

arXiv.org Artificial IntelligenceMay-19-2025

Early Risk Detection (ERD) on the Web aims to identify promptly users facing social and health issues. Users are analyzed post-by-post, and it is necessary to guarantee correct and quick answers, which is particularly challenging in critical scenarios. ERD involves optimizing classification precision and minimizing detection delay. Standard classification metrics may not suffice, resorting to specific metrics such as ERDE(theta) that explicitly consider precision and delay. The current research focuses on applying a multi-objective approach, prioritizing classification performance and establishing a separate criterion for decision time. In this work, we propose a completely different strategy, temporal fine-tuning, which allows tuning transformer-based models by explicitly incorporating time within the learning process. Our method allows us to analyze complete user post histories, tune models considering different contexts, and evaluate training performance using temporal metrics. We evaluated our proposal in the depression and eating disorders tasks for the Spanish language, achieving competitive results compared to the best models of MentalRiskES 2023. We found that temporal fine-tuning optimized decisions considering context and time progress. In this way, by properly taking advantage of the power of transformers, it is possible to address ERD by combining precision and speed as a single objective.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.1128

Country:

Europe (0.68)
South America > Argentina (0.28)
North America > Mexico (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions

Sánchez-Cortés, Dairazalia, Burdisso, Sergio, Villatoro-Tello, Esaú, Motlicek, Petr

arXiv.org Artificial IntelligenceOct-23-2024

Bias assessment of news sources is paramount for professionals, organizations, and researchers who rely on truthful evidence for information gathering and reporting. While certain bias indicators are discernible from content analysis, descriptors like political bias and fake news pose greater challenges. In this paper, we propose an extension to a recently presented news media reliability estimation method that focuses on modeling outlets and their longitudinal web interactions. Concretely, we assess the classification performance of four reinforcement learning strategies on a large news media hyperlink graph. Our experiments, targeting two challenging bias descriptors, factual reporting and political bias, showed a significant performance improvement at the source media level. Additionally, we validate our methods on the CLEF 2023 CheckThat! Lab challenge, outperforming the reported results in both, F1-score and the official MAE metric. Furthermore, we contribute by releasing the largest annotated dataset of news source media, categorized with factual reporting and political bias labels. Our findings suggest that profiling news media sources based on their hyperlink interactions over time is feasible, offering a bird's-eye view of evolving media landscapes.

information retrieval, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-71736-9_7

2410.17655

Country:

Europe > Switzerland (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

A Time-Aware Approach to Early Detection of Anorexia: UNSL at eRisk 2024

Thompson, Horacio, Errecalde, Marcelo

arXiv.org Artificial IntelligenceOct-23-2024

The eRisk laboratory aims to address issues related to early risk detection on the Web. In this year's edition, three tasks were proposed, where Task 2 was about early detection of signs of anorexia. Early risk detection is a problem where precision and speed are two crucial objectives. Our research group solved Task 2 by defining a CPI+DMC approach, addressing both objectives independently, and a time-aware approach, where precision and speed are considered a combined single-objective. We implemented the last approach by explicitly integrating time during the learning process, considering the ERDE{\theta} metric as the training objective. It also allowed us to incorporate temporal metrics to validate and select the optimal models. We achieved outstanding results for the ERDE50 metric and ranking-based metrics, demonstrating consistency in solving ERD problems.

artificial intelligence, machine learning, unsl, (17 more...)

arXiv.org Artificial Intelligence

2410.17963

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
South America > Argentina > Cuyo > San Luis Province > San Luis (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

HYBRINFOX at CheckThat! 2024 -- Task 2: Enriching BERT Models with the Expert System VAGO for Subjectivity Detection

Casanova, Morgane, Chanson, Julien, Icard, Benjamin, Faye, Géraud, Gadek, Guillaume, Gravier, Guillaume, Égré, Paul

arXiv.org Artificial IntelligenceJul-4-2024

This paper presents the HYBRINFOX method used to solve Task 2 of Subjectivity detection of the CLEF 2024 CheckThat! competition. The specificity of the method is to use a hybrid system, combining a RoBERTa model, fine-tuned for subjectivity detection, a frozen sentence-BERT (sBERT) model to capture semantics, and several scores calculated by the English version of the expert system VAGO, developed independently of this task to measure vagueness and subjectivity in texts based on the lexicon. In English, the HYBRINFOX method ranked 1st with a macro F1 score of 0.7442 on the evaluation data. For the other languages, the method used a translation step into English, producing more mixed results (ranking 1st in Multilingual and 2nd in Italian over the baseline, but under the baseline in Bulgarian, German, and Arabic). We explain the principles of our hybrid approach, and outline ways in which the method could be improved for other languages besides English.

checkthat, subjectivity, vagueness, (16 more...)

arXiv.org Artificial Intelligence

2407.0377

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.64)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.62)

Add feedback

Zero-shot Generative Large Language Models for Systematic Review Screening Automation

Wang, Shuai, Scells, Harrisen, Zhuang, Shengyao, Potthast, Martin, Koopman, Bevan, Zuccon, Guido

arXiv.org Artificial IntelligenceJan-31-2024

Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models~(LLMs) for automatic screening. We evaluate the effectiveness of eight different LLMs and investigate a calibration technique that uses a predefined recall threshold to determine whether a publication should be included in a systematic review. Our comprehensive evaluation using five standard test collections shows that instruction fine-tuning plays an important role in screening, that calibration renders LLMs practical for achieving a targeted recall, and that combining both with an ensemble of zero-shot models saves significant screening time compared to state-of-the-art approaches.

cal, screening, systematic review, (13 more...)

arXiv.org Artificial Intelligence

2401.0632

Country:

Europe > Germany > Saxony > Leipzig (0.04)
Oceania > Australia > Queensland (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback