AITopics | vqa

Collaborating Authors

vqa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Question-Image Co-Attention for Visual Question Answering

Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

Neural Information Processing SystemsMay-1-2026, 06:36:04 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, mechanism, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

A Hitchhiker's Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

Neural Information Processing SystemsMar-17-2026, 20:30:07 GMT

Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like face forgery detection, where viewers often struggle to distinguish between real and fabricated content. Vision and Large Language Models (VLLM) bridge computer vision and natural language, offering numerous applications driven by strong common-sense reasoning. Despite their success in various tasks, the potential of vision and language remains underexplored in face forgery detection, where they hold promise for enhancing explainability by leveraging the intrinsic reasoning capabilities of language to analyse fine-grained manipulation areas. For that reason, few works have recently started to frame the problem of deepfake detection as a Visual Question Answering (VQA) task, nevertheless omitting the realistic and informative open-ended multi-label setting. With the rapid advances in the field of VLLM, an exponential rise of investigations in that direction is expected. As such, there is a need for a clear experimental methodology that converts face forgery detection to a Visual Question Answering (VQA) task to systematically and fairly evaluate different VLLM architectures. Previous evaluation studies in deepfake detection have mostly focused on the simpler binary task, overlooking evaluation protocols for multi-label fine-grained detection and text-generative models. We propose a multi-staged approach that diverges from the traditional binary evaluation protocol and conducts a comprehensive evaluation study to compare the capabilities of several VLLMs in this context. In the first stage, we assess the models' performance on the binary task and their sensitivity to given instructions using several prompts.

artificial intelligence, detection, natural language, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.62)

Add feedback

d0822540916cd716add52e1846a6e18d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 06:01:26 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

b619cd6dcc986856b8a8da2b08d89396-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 16:07:23 GMT

large language model, lever-lm, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

b14cf0a01f7a8b9cd3e365e40f910272-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 14:40:24 GMT

large language model, machine learning, question answering, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.31)

Add feedback

9fe77ac7060e716f2d42631d156825c0-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 07:00:52 GMT

artificial intelligence, machine learning, textual concept, (10 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

7611a3cb5d733e628081431445cb01fd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 21:30:12 GMT

Thisresultleadstoageneral standpoint that deep quantum circuits would not be feasible for practical tasks.

artificial intelligence, gradient, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

REVIVE: RegionalVisualRepresentationMattersin Knowledge-BasedVisualQuestionAnswering

Neural Information Processing SystemsFeb-8-2026, 15:27:04 GMT

This paper revisits visual representation in knowledge-based visual question answering(VQA)anddemonstrates thatusingregionalinformation inabetterway can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual inputtoanswerthequestion.

knowledge management, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Knowledge Management > Knowledge Engineering (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.71)

Add feedback

Supervising the Transfer of Reasoning Patterns in VQA

Neural Information Processing SystemsDec-24-2025, 13:25:46 GMT

Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs. This provides evidence that deep neural networks can learn to reason when training conditions are favorable enough. However, transferring this learned knowledge to deployable models is a challenge, as much of it is lost during the transfer.We propose a method for knowledge transfer based on a regularization term in our loss function, supervising the sequence of required reasoning operations.We provide a theoretical analysis based on PAC-learning, showing that such program prediction can lead to decreased sample complexity under mild hypotheses. We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and show its complementarity to BERT-like self-supervised pre-training.

name change, reasoning pattern, supervising, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai

Nonesung, Surapon, Jaknamon, Teetouch, Chaiophat, Sirinya, Nitarach, Natapong, Wittayasakpan, Chanakan, Sirichotedumrong, Warit, Na-Thalang, Adisai, Pipatanakul, Kunat

arXiv.org Artificial IntelligenceDec-5-2025

We present ThaiOCRBench, the first comprehensive benchmark for evaluating vision-language models (VLMs) on Thai text-rich visual understanding tasks. Despite recent progress in multimodal modeling, existing benchmarks predominantly focus on high-resource languages, leaving Thai underrepresented, especially in tasks requiring document structure understanding. ThaiOCRBench addresses this gap by offering a diverse, human-annotated dataset comprising 2,808 samples across 13 task categories. We evaluate a wide range of state-of-the-art VLMs in a zero-shot setting, spanning both proprietary and open-source systems. Results show a significant performance gap, with proprietary models (e.g., Gemini 2.5 Pro) outperforming open-source counterparts. Notably, fine-grained text recognition and handwritten content extraction exhibit the steepest performance drops among open-source models. Through detailed error analysis, we identify key challenges such as language bias, structural mismatch, and hallucinated content. ThaiOCRBench provides a standardized framework for assessing VLMs in low-resource, script-complex settings, and provides actionable insights for improving Thai-language document understanding.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.04479

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback