AITopics | infographicsvqa

Collaborating Authors

infographicsvqa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Confidence Paradox: Can LLM Know When It's Wrong

Tripathi, Sahil, Nafis, Md Tabrez, Hussain, Imran, Gao, Jiechao

arXiv.org Artificial IntelligenceOct-29-2025

Document Visual Question Answering (DocVQA) models often produce overconfident or ethically misaligned responses, especially under uncertainty. Existing models like LayoutLMv3, UDOP, and DONUT focus on accuracy but lack ethical calibration. We propose HonestVQA, a model-agnostic, self-supervised framework that aligns model confidence with correctness using weighted loss and contrastive learning. We introduce two new metrics Honesty Score (H-Score) and Ethical Confidence Index (ECI)-to evaluate ethical alignment. HonestVQA improves accuracy and F1 by up to 4.3% across SpDocVQA, InfographicsVQA, and SROIE datasets, while reducing overconfidence. It also generalizes well across domains, achieving 78.9% accuracy and 76.1% F1-score.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2506.23464

Country: Asia > India > NCT (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)

Add feedback

Notes on Applicability of GPT-4 to Document Understanding

Borchmann, Łukasz

arXiv.org Artificial IntelligenceMay-28-2024

We perform a missing, reproducible evaluation of all publicly available GPT-4 family models concerning the Document Understanding field, where it is frequently required to comprehend text spacial arrangement and visual clues in addition to textual semantics. Benchmark results indicate that though it is hard to achieve satisfactory results with text-only models, GPT-4 Vision Turbo performs well when one provides both text recognized by an external OCR engine and document images on the input. Evaluation is followed by analyses that suggest possible contamination of textual GPT-4 models and indicate the significant performance drop for lengthy documents.

dataset, full sentence, infographicsvqa, (16 more...)

arXiv.org Artificial Intelligence

2405.18433

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback