AITopics | question answering

Collaborating Authors

question answering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-modal Situated Reasoning in 3D Scenes

Neural Information Processing SystemsJun-2-2025, 15:52:55 GMT

Situation awareness is essential for understanding and reasoning about 3D scenes in embodied AI agents. However, existing datasets and benchmarks for situated understanding are limited in data modality, diversity, scale, and task scope. To address these limitations, we propose Multi-modal Situated Question Answering (MSQA), a large-scale multi-modal situated reasoning dataset, scalably collected leveraging 3D scene graphs and vision-language models (VLMs) across a diverse range of real-world 3D scenes. MSQA includes 251K situated question-answering pairs across 9 distinct question categories, covering complex scenarios within 3D scenes. We introduce a novel interleaved multi-modal input setting in our benchmark to provide text, image, and point cloud for situation and question description, resolving ambiguity in previous single-modality convention (e.g., text). Additionally, we devise the Multi-modal Situated Next-step Navigation (MSNN) benchmark to evaluate models' situated reasoning for navigation. Comprehensive evaluations on MSQA and MSNN highlight the limitations of existing vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling. Experiments on data scaling and cross-domain transfer further demonstrate the efficacy of leveraging MSQA as a pre-training dataset for developing more powerful situated reasoning models.

large language model, machine learning, question answering, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.45)

Industry: Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(3 more...)

Add feedback

Appendix A Evaluated Models B Involved Datasets C Construction Process of QA Pairs

Neural Information Processing SystemsJun-2-2025, 09:25:42 GMT

WARNING: The Appendix contains model outputs that may be considered offensive.

machine learning, natural language, question answering, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.51)

Add feedback

Supplementary Materials for MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing SystemsJun-2-2025, 04:57:03 GMT

We utilize an open and widely used data format, i.e., JSON format, for the MEQA dataset. A sample within the dataset, accompanied by the data format explanation, is shown in Listing 1. " context ": " Roadside IED kills Russian major general [...] ", # The context of the question " question ": " Who died before AI - monitor reported it online?", " What event contains Al - Monitor is the communicator? " What event is after #1 has a victim? " Who died in the #2? major general, local commander, lieutenant general " The dataset and source code for the MEQA dataset have been released to GitHub: https:// github.com/du-nlp-lab/MEQA.

artificial intelligence, natural language, question answering, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.42)

Add feedback

MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing SystemsJun-2-2025, 04:57:00 GMT

Existing benchmarks for multi-hop question answering (QA) primarily evaluate models based on their ability to reason about entities and the relationships between them. However, there's a lack of insight into how these models perform in terms of both events and entities.

explanation, large language model, question answering, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia > Middle East (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Government (1.00)
Law (0.93)
Leisure & Entertainment > Sports > Basketball (0.67)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)

Add feedback

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Rama Chellappa 2 Google Research 2

Neural Information Processing SystemsJun-2-2025, 02:03:14 GMT

Seeking answers to questions within long scientific research articles is a crucial area of study that aids readers in quickly addressing their inquiries. However, existing question-answering (QA) datasets based on scientific papers are limited in scale and focus solely on textual content. We introduce SPIQA (Scientific Paper Image Question Answering), the first large-scale QA dataset specifically designed to interpret complex figures and tables within the context of scientific research articles across various domains of computer science. Leveraging the breadth of expertise and ability of multimodal large language models (MLLMs) to understand figures, we employ automatic and manual curation to create the dataset. We craft an information-seeking task on interleaved images and text that involves multiple images covering plots, charts, tables, schematic diagrams, and result visualizations. SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits. Through extensive experiments with 12 prominent foundational models, we evaluate the ability of current multimodal systems to comprehend the nuanced aspects of research articles. Additionally, we propose a Chain-of-Thought (CoT) evaluation strategy with in-context retrieval that allows fine-grained, step-by-step assessment and improves model performance. We further explore the upper bounds of performance enhancement with additional textual information, highlighting its promising potential for future research and the dataset's impact on revolutionizing how we interact with scientific literature.

large language model, machine learning, question answering, (23 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Visual Question Answering, Asking and Assessment

Neural Information Processing SystemsJun-1-2025, 22:41:37 GMT

Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge. By enhancing these capabilities, humans can more effectively utilize data, leading to better comprehension and learning outcomes. Current Multimodal Large Language Models (MLLMs) primarily focus on question answering, often neglecting the full potential of questioning and assessment skills.

large language model, machine learning, question answering, (21 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Pengcheng Chen 1,2 Jin Ye1,3 Guoan Wang 1,4 Yanjun Li1,4

Neural Information Processing SystemsMay-31-2025, 22:38:08 GMT

Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Current benchmarks are often built upon specific academic literature, mainly focusing on a single domain, and lacking varying perceptual granularities.

large language model, machine learning, question answering, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(9 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

CausalChaos for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

Neural Information Processing SystemsMay-31-2025, 21:22:08 GMT

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multilevel answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.

large language model, machine learning, question answering, (21 more...)

Neural Information Processing Systems

Country:

Asia (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

RUBi: Reducing Unimodal Biases for Visual Question Answering

Remi Cadene, Corentin Dancette, Hedi Ben younes, Matthieu Cord, Devi Parikh

Neural Information Processing SystemsMay-31-2025, 17:29:25 GMT

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.

machine learning, natural language, question answering, (20 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

A Foundation Model for Zero-shot Logical Query Reasoning

Neural Information Processing SystemsMay-29-2025, 18:19:25 GMT

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional queries comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph.

large language model, machine learning, question answering, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology: