AITopics | bleurt

Collaborating Authors

bleurt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scaling Sign Language Translation

Neural Information Processing SystemsFeb-18-2026, 05:22:25 GMT

Sign language translation (SL T) addresses the problem of translating information from a sign language in video to a spoken language in text. Existing studies, while showing progress, are often limited to narrow domains and/or few sign languages and struggle with open-domain tasks. In this paper, we push forward the frontier of SL T by scaling pretraining data, model size, and number of translation directions. We perform large-scale SL T pretraining on different data including 1) noisy multilingual Y ouTube SL T data, 2) parallel text corpora, and 3) SL T data augmented by translating video captions to other languages with off-the-shelf machine translation models. We unify different pretraining tasks with task-specific prompts under the encoder-decoder architecture, and initialize the SL T model with pretrained (m/By)T5 models across model sizes. SL T pretraining results on How2Sign and FLEURS-ASL#0 (ASL to 42 spoken languages) demonstrate the significance of data/model scaling and cross-lingual cross-modal transfer, as well as the feasibility of zero-shot SL T. We finetune the pretrained SL T models on 5 downstream open-domain SL T benchmarks covering 5 sign languages. Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOT A) by wide margins.

large language model, machine learning, translation, (20 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Switzerland (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Critical Study of Automatic Evaluation in Sign Language Translation

Yazdani, Shakib, Hamidullah, Yasser, España-Bonet, Cristina, Avramidis, Eleftherios, van Genabith, Josef

arXiv.org Artificial IntelligenceNov-17-2025

Automatic evaluation metrics are crucial for advancing sign language translation (SLT). Current SLT evaluation metrics, such as BLEU and ROUGE, are only text-based, and it remains unclear to what extent text-based metrics can reliably capture the quality of SLT outputs. To address this gap, we investigate the limitations of text-based SLT evaluation metrics by analyzing six metrics, including BLEU, chrF, and ROUGE, as well as BLEURT on the one hand, and large language model (LLM)-based evaluators such as G-Eval and GEMBA zero-shot direct assessment on the other hand. Specifically, we assess the consistency and robustness of these metrics under three controlled conditions: paraphrasing, hallucinations in model outputs, and variations in sentence length. Our analysis highlights the limitations of lexical overlap metrics and demonstrates that while LLM-based evaluators better capture semantic equivalence often missed by conventional metrics, they can also exhibit bias toward LLM-paraphrased translations. Moreover, although all metrics are able to detect hallucinations, BLEU tends to be overly sensitive, whereas BLEURT and LLM-based evaluators are comparatively lenient toward subtle cases. This motivates the need for multimodal evaluation frameworks that extend beyond text-based metrics to enable a more holistic assessment of SLT outputs.

large language model, machine learning, translation, (16 more...)

arXiv.org Artificial Intelligence

2510.25434

Country:

North America > United States (0.94)
Asia (0.68)
Europe > Spain > Catalonia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Curriculum > Subject-Specific Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

ced76a666704e381c3039871ffe558ee-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 17:01:24 GMT

bleurt, slt, translation, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Switzerland (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

Merzah, Baqer M., Taami, Tania, Asoudeh, Salman, Mirzaee, Saeed, pour, Amir reza Hossein, Bengari, Amir Ali

arXiv.org Artificial IntelligenceJul-3-2025

Large Language Models (LLMs) have recently gained attention in the life sciences due to their capacity to model, extract, and apply complex biological information. Beyond their classical use as chatbots, these systems are increasingly used for complex analysis and problem-solving in specialized fields, including bioinformatics. First, we introduce BIOPARS-BENCH, a dataset from over 10,000 scientific articles, textbooks, and medical websites. BioParsQA was also introduced to evaluate the proposed model, which consists of 5,231 Persian medical questions and answers. This study then introduces BioPars, a simple but accurate measure designed to assess LLMs for three main abilities: acquiring subject-specific knowledge, interpreting and synthesizing such knowledge, and demonstrating proper evidence. Comparing ChatGPT, Llama, and Galactica, our study highlights their ability to remember and retrieve learned knowledge but also reveals shortcomings in addressing higher-level, real-world questions and fine-grained inferences. These findings indicate the need for further fine-tuning to address the capabilities of LLM in bioinformatics tasks. To our knowledge, BioPars is the first application of LLM in Persian medical QA, especially for generating long answers. Evaluation of four selected medical QA datasets shows that BioPars has achieved remarkable results compared to comparative approaches. The model on BioParsQA achieved a ROUGE-L score of 29.99, which is an improvement over GPT-4 1.0. The model achieved a BERTScore of 90.87 with the MMR method. The MoverScore and BLEURT values were also higher in this model than the other three models. In addition, the reported scores for the model are MoverScore=60.43 and BLEURT=50.78. BioPars is an ongoing project and all resources related to its development will be made available via the following GitHub repository: https://github.com/amirap80/BioPars.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.21567

Country: Asia > Middle East > Iran (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos

Reichman, Benjamin, Patsch, Constantin, Truxal, Jack, Jain, Atishay, Heck, Larry

arXiv.org Artificial IntelligenceJun-12-2025

In outside knowledge visual question answering (OK-VQA), the model must identify relevant visual information within an image and incorporate external knowledge to accurately respond to a question. Extending this task to a visually grounded dialogue setting based on videos, a conversational model must both recognize pertinent visual details over time and answer questions where the required information is not necessarily present in the visual information. Moreover, the context of the overall conversation must be considered for the subsequent dialogue. To explore this task, we introduce a dataset comprised of $2,017$ videos with $5,986$ human-annotated dialogues consisting of $40,954$ interleaved dialogue turns. While the dialogue context is visually grounded in specific video segments, the questions further require external knowledge that is not visually present. Thus, the model not only has to identify relevant video parts but also leverage external knowledge to converse within the dialogue. We further provide several baselines evaluated on our dataset and show future challenges associated with this task. The dataset is made publicly available here: https://github.com/c-patsch/OKCV.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2506.09953

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(3 more...)

Add feedback

Optimizing Estonian TV Subtitles with Semi-supervised Learning and LLMs

Fedorchenko, Artem, Alumäe, Tanel

arXiv.org Artificial IntelligenceJan-9-2025

For instance, Both iterative pseudo-labeling and LLM-based recent studies (Mykhalevych and Preply, 2024; post-editing have been an active area of research Kim et al., 2023) have revealed that 50% of Americans in the context of verbatim automatic speech and 85% of the Netflix users overall frequently recognition (ASR). Pseudo-labeling based semisupervised watch TV and streaming video content learning in ASR has been studied since with subtitles. Studies show that subtitles can enhance at least (Zavaliagkos et al., 1998) and has been understanding and memory retention. A lot later investigated in several works, e.g. by Veselỳ of viewers choose to enjoy their content quietly et al. (2013); Xu et al. (2020).

arxiv preprint arxiv, dataset, subtitle, (12 more...)

arXiv.org Artificial Intelligence

2501.05234

Country:

Europe > Estonia > Harju County > Tallinn (0.05)
Europe > Faroe Islands > Streymoy > Tórshavn (0.04)

Genre: Research Report > New Finding (0.94)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives

Ichmoukhamedov, Timour, Hinns, James, Martens, David

arXiv.org Artificial IntelligenceDec-13-2024

A rapidly developing application of LLMs in XAI is to convert quantitative explanations such as SHAP into user-friendly narratives to explain the decisions made by smaller prediction models. Evaluating the narratives without relying on human preference studies or surveys is becoming increasingly important in this field. In this work we propose a framework and explore several automated metrics to evaluate LLM-generated narratives for explanations of tabular classification tasks. We apply our approach to compare several state-of-the-art LLMs across different datasets and prompt types. As a demonstration of their utility, these metrics allow us to identify new challenges related to LLM hallucinations for XAI narratives.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.1022

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences

Winata, Genta Indra, Anugraha, David, Susanto, Lucky, Kuwanto, Garry, Wijaya, Derry Tanti

arXiv.org Artificial IntelligenceNov-28-2024

Understanding the quality of a performance evaluation metric is crucial for ensuring that model outputs align with human preferences. However, it remains unclear how well each metric captures the diverse aspects of these preferences, as metrics often excel in one particular area but not across all dimensions. To address this, it is essential to systematically calibrate metrics to specific aspects of human preference, catering to the unique characteristics of each aspect. We introduce MetaMetrics, a calibrated meta-metric designed to evaluate generation tasks across different modalities in a supervised manner. MetaMetrics optimizes the combination of existing metrics to enhance their alignment with human preferences. Our metric demonstrates flexibility and effectiveness in both language and vision downstream tasks, showing significant benefits across various multilingual and multi-domain scenarios. MetaMetrics aligns closely with human preferences and is highly extendable and easily integrable into any application. This makes MetaMetrics a powerful tool for improving the evaluation of generation tasks, ensuring that metrics are more representative of human judgment across diverse contexts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.02381

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
(4 more...)

Add feedback

Scaling Sign Language Translation

Zhang, Biao, Tanzer, Garrett, Firat, Orhan

arXiv.org Artificial IntelligenceJul-16-2024

Sign language translation (SLT) addresses the problem of translating information from a sign language in video to a spoken language in text. Existing studies, while showing progress, are often limited to narrow domains and/or few sign languages and struggle with open-domain tasks. In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. We perform large-scale SLT pretraining on different data including 1) noisy multilingual YouTube SLT data, 2) parallel text corpora, and 3) SLT data augmented by translating video captions to other languages with off-the-shelf machine translation models. We unify different pretraining tasks with task-specific prompts under the encoder-decoder architecture, and initialize the SLT model with pretrained (m/By)T5 models across model sizes. SLT pretraining results on How2Sign and FLEURS-ASL#0 (ASL to 42 spoken languages) demonstrate the significance of data/model scaling and cross-lingual cross-modal transfer, as well as the feasibility of zero-shot SLT. We finetune the pretrained SLT models on 5 downstream open-domain SLT benchmarks covering 5 sign languages. Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOTA) by wide margins.

bleurt, slt, translation, (12 more...)

arXiv.org Artificial Intelligence

2407.11855

Country:

Asia > Singapore (0.04)
Europe > Switzerland (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(19 more...)

Genre: Research Report > New Finding (0.93)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Don't Throw Away Data: Better Sequence Knowledge Distillation

Wang, Jun, Briakou, Eleftheria, Dadkhahi, Hamid, Agarwal, Rishabh, Cherry, Colin, Cohn, Trevor

arXiv.org Artificial IntelligenceJul-15-2024

A critical component in knowledge distillation is the means of coupling the teacher and student. The predominant sequence knowledge distillation method involves supervised learning of the student against teacher-decoded outputs, and is exemplified by the current state of the art, which incorporates minimum Bayes risk (MBR) decoding. In this paper we seek to integrate MBR more tightly in distillation training, specifically by using several high scoring MBR translations, rather than a single selected sequence, thus capturing a rich diversity of teacher outputs. Our experiments on English to German and English to Japanese translation show consistent improvements over strong baseline methods for both tasks and with varying model sizes. Additionally, we conduct a detailed analysis focusing on data efficiency and capacity curse aspects to elucidate MBR-n and explore its further potential.

computational linguistic, student model, translation, (14 more...)

arXiv.org Artificial Intelligence

2407.10456

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback