AITopics | vqa system

Collaborating Authors

vqa system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Self-Critical Reasoning for Robust Visual Question Answering

Jialin Wu, Raymond Mooney

Neural Information Processing SystemsFeb-11-2026, 20:58:48 GMT

Neural Information Processing Systems http://nips.cc/

explanation, sensitivity, vqa system, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Self-Critical Reasoning for Robust Visual Question Answering

Jialin Wu, Raymond Mooney

Neural Information Processing SystemsOct-9-2025, 13:39:41 GMT

Neural Information Processing Systems http://nips.cc/

explanation, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA

Madaka, Madhuri Latha, Bhagvati, Chakravarthy

arXiv.org Artificial IntelligenceFeb-5-2025

Designing datasets for Visual Question Answering (VQA) is a difficult and complex task that requires NLP for parsing and computer vision for analysing the relevant aspects of the image for answering the question asked. Several benchmark datasets have been developed by researchers but there are many issues with using them for methodical performance tests. This paper proposes a new benchmark dataset -- a pilot version called VQA-Levels is ready now -- for testing VQA systems systematically and assisting researchers in advancing the field. The questions are classified into seven levels ranging from direct answers based on low-level image features (without needing even a classifier) to those requiring high-level abstraction of the entire image content. The questions in the dataset exhibit one or many of ten properties. Each is categorised into a specific level from 1 to 7. Levels 1 - 3 are directly on the visual content while the remaining levels require extra knowledge about the objects in the image. Each question generally has a unique one or two-word answer. The questions are 'natural' in the sense that a human is likely to ask such a question when seeing the images. An example question at Level 1 is, ``What is the shape of the red colored region in the image?" while at Level 7, it is, ``Why is the man cutting the paper?". Initial testing of the proposed dataset on some of the existing VQA systems reveals that their success is high on Level 1 (low level features) and Level 2 (object classification) questions, least on Level 3 (scene text) followed by Level 6 (extrapolation) and Level 7 (whole scene analysis) questions. The work in this paper will go a long way to systematically analyze VQA systems.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2502.02951

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.36)

Add feedback

Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective

Chen, Xiaolan, Chen, Ruoyu, Xu, Pusheng, Zhang, Weiyi, Shang, Xianwen, He, Mingguang, Shi, Danli

arXiv.org Artificial IntelligenceOct-21-2024

Accurate diagnosis of ophthalmic diseases relies heavily on the interpretation of multimodal ophthalmic images, a process often time-consuming and expertise-dependent. Visual Question Answering (VQA) presents a potential interdisciplinary solution by merging computer vision and natural language processing to comprehend and respond to queries about medical images. This review article explores the recent advancements and future prospects of VQA in ophthalmology from both theoretical and practical perspectives, aiming to provide eye care professionals with a deeper understanding and tools for leveraging the underlying models. Additionally, we discuss the promising trend of large language models (LLM) in enhancing various components of the VQA framework to adapt to multimodal ophthalmic tasks. Despite the promising outlook, ophthalmic VQA still faces several challenges, including the scarcity of annotated multimodal image datasets, the necessity of comprehensive and unified evaluation methods, and the obstacles to achieving effective real-world applications. This article highlights these challenges and clarifies future directions for advancing ophthalmic VQA with LLMs. The development of LLM-based ophthalmic VQA systems calls for collaborative efforts between medical professionals and AI experts to overcome existing obstacles and advance the diagnosis and care of eye diseases. Keywords: Ophthalmic Visual Question Answering, Large Language Models, Multimodal Image Interpretation, Report Generation, Generative Artificial Intelligence Introduction Accurate diagnosis of ophthalmic diseases often relies on the comprehensive analysis of multimodal ophthalmic images, including color fundus photographs (CFP), optical coherence tomography (OCT), fundus fluorescein angiography (FFA), scanning laser ophthalmoscopy (SLO), anterior segment photographs and corneal topography, etc.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.16662

Country:

Asia > China > Hong Kong > Kowloon (0.05)
Europe > United Kingdom > Scotland (0.04)
Europe > United Kingdom > England (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Extracting Training Data from Document-Based VQA Models

Pinto, Francesco, Rauschmayr, Nathalie, Tramèr, Florian, Torr, Philip, Tombari, Federico

arXiv.org Artificial IntelligenceJul-11-2024

Vision-Language Models (VLMs) have made remarkable progress in document-based Visual Question Answering (i.e., responding to queries about the contents of an input document provided as an image). In this work, we show these models can memorize responses for training samples and regurgitate them even when the relevant visual information has been removed. This includes Personal Identifiable Information (PII) repeated once in the training set, indicating these models could divulge memorised sensitive information and therefore pose a privacy risk. We quantitatively measure the extractability of information in controlled experiments and differentiate between cases where it arises from generalization capabilities or from memorization. We further investigate the factors that influence memorization across multiple state-of-the-art models and propose an effective heuristic countermeasure that empirically prevents the extractability of PII.

extracting training data, information, memorization, (15 more...)

arXiv.org Artificial Intelligence

2407.08707

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Visually Grounded VQA by Lattice-based Retrieval

Reich, Daniel, Putze, Felix, Schultz, Tanja

arXiv.org Artificial IntelligenceNov-15-2022

Visual Grounding (VG) in Visual Question Answering (VQA) systems describes how well a system manages to tie a question and its answer to relevant image regions. Systems with strong VG are considered intuitively interpretable and suggest an improved scene understanding. While VQA accuracy performances have seen impressive gains over the past few years, explicit improvements to VG performance and evaluation thereof have often taken a back seat on the road to overall accuracy improvements. A cause of this originates in the predominant choice of learning paradigm for VQA systems, which consists of training a discriminative classifier over a predetermined set of answer options. In this work, we break with the dominant VQA modeling paradigm of classification and investigate VQA from the standpoint of an information retrieval task. As such, the developed system directly ties VG into its core search procedure. Our system operates over a weighted, directed, acyclic graph, a.k.a. "lattice", which is derived from the scene graph of a given image in conjunction with region-referring expressions extracted from the question. We give a detailed analysis of our approach and discuss its distinctive properties and limitations. Our approach achieves the strongest VG performance among examined systems and exhibits exceptional generalization capabilities in a number of scenarios.

category, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2211.08086

Country:

Europe > Germany > Bremen > Bremen (0.28)
North America > United States > Virginia (0.04)
North America > United States > Indiana > Marion County > Lawrence (0.04)
(3 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?

Cao, Yang Trista, Seelman, Kyle, Lee, Kyungjun, Daumé, Hal III

arXiv.org Artificial IntelligenceOct-26-2022

In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine "understanding" and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine "understanding" datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2210.14966

Country:

North America > United States > Maryland (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)

Add feedback

Medical Visual Question Answering: A Survey

Lin, Zhihong, Zhang, Donghao, Tac, Qingyi, Shi, Danli, Haffari, Gholamreza, Wu, Qi, He, Mingguang, Ge, Zongyuan

arXiv.org Artificial IntelligenceNov-19-2021

Medical Visual Question Answering (VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. Although the general-domain VQA has been extensively studied, the medical VQA still needs specific investigation and exploration due to its task features. In the first part of this survey, we cover and discuss the publicly available medical VQA datasets up to date about the data source, data quantity, and task feature. In the second part, we review the approaches used in medical VQA tasks. In the last part, we analyze some medical-specific challenges for the field and discuss future research directions.

dataset, medical vqa, working note, (15 more...)

arXiv.org Artificial Intelligence

2111.10056

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
South America > Chile (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
(13 more...)

Genre: Overview (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Introduction to Visual Question Answering: Datasets, Approaches and Evaluation - Tryolabs Blog

@machinelearnbotMar-14-2018, 21:15:20 GMT

Historically, building a system that can answer natural language questions about any image has been considered a very ambitious goal. So, how many players are in the image? Well, we can count them and see that there are eleven players, since we are smart enough not to count the referee, right? Although as humans we can normally perform this task without major inconveniences, the development of a system with these capabilities has always seemed closer to science fiction than to the current possibilities of Artificial Intelligence (AI). However, with the advent of Deep Learning (DL), we have witnessed enormous research progress in Visual Question Answering (VQA), in such a way that systems capable of answering these questions are emerging with promising results. In this article I will briefly go through some of the current datasets, approaches and evaluation metrics in VQA, and on how this challenging task can be applied to real life use cases.

machine learning, natural language, question answering, (18 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Kafle, Kushal, Kanan, Christopher

arXiv.org Artificial IntelligenceJun-14-2017

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cviu.2017.06.005

1610.01465

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback