AITopics | complex question

Collaborating Authors

complex question

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA

Chen, Zhuo, Wang, Fei, Li, Zixuan, Zhang, Zhao, Ding, Weiwei, Yang, Chuanguang, Xu, Yongjun, Jin, Xiaolong, Guo, Jiafeng

arXiv.org Artificial IntelligenceNov-19-2025

Knowledge Base Question Answering (KBQA) aims to answer natural-language questions over a structured Knowledge Base (KB). Recent work improves KBQA by adopting an agentic reasoning paradigm, in which Large Language Models (LLMs) iteratively decompose a question, generate its corresponding logical queries, and interact with the KB to derive the answer. However, these methods typically fine-tune LLMs on reasoning trajectories synthesized via process supervision, which offers weak incentives for exploration and thus fails to strengthen the agentic reasoning ability. In this paper, we propose KnowCoder-A1, an LLM that can autonomously perform agentic reasoning on KBs to obtain answers. To incentivize autonomous exploration, KnowCoder-A1 trains the LLM under outcome-only supervision via a multi-stage curriculum reinforcement learning with an easy-to-hard curriculum. To establish foundational agentic capabilities, KnowCoder-A1 first fine-tunes the LLM on a small set of high-quality trajectories obtained through outcome-based rejection sampling. Then, to alleviate the reward sparsity inherent in outcome-only supervision, it applies multi-stage curriculum RL with reward schedules that progress from easy to hard. Trained with outcome-only supervision, KnowCoder-A1 exhibits powerful reasoning behaviors and consistently outperforms prior approaches across three mainstream datasets. Notably, on the zero-shot subset of GrailQA, KnowCoder-A1 achieves up to an 11.1% relative improvement while using only one-twelfth of the training data, demonstrating strong agentic reasoning capabilities.

large language model, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2510.25101

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Composition-Grounded Instruction Synthesis for Visual Reasoning

Gu, Xinyi, Mao, Jiayuan, Hong, Zhang-Wei, Yu, Zhuoran, Li, Pengyuan, Joshi, Dhiraj, Feris, Rogerio, He, Zexue

arXiv.org Artificial IntelligenceOct-20-2025

Pretrained multi-modal large language models (MLLMs) demonstrate strong performance on diverse multimodal tasks, but remain limited in reasoning capabilities for domains where annotations are difficult to collect. In this work, we focus on artificial image domains such as charts, rendered documents, and webpages, which are abundant in practice yet lack large-scale human annotated reasoning datasets. We introduce COGS (COmposition-Grounded instruction Synthesis), a data-efficient framework for equipping MLLMs with advanced reasoning abilities from a small set of seed questions. The key idea is to decompose each seed question into primitive perception and reasoning factors, which can then be systematically recomposed with new images to generate large collections of synthetic question-answer pairs. Each generated question is paired with subquestions and intermediate answers, enabling reinforcement learning with factor-level process rewards. Experiments on chart reasoning show that COGS substantially improves performance on unseen questions, with the largest gains on reasoning-heavy and compositional questions. Moreover, training with a factor-level mixture of different seed data yields better transfer across multiple datasets, suggesting that COGS induces generalizable capabilities rather than dataset-specific overfitting. We further demonstrate that the framework extends beyond charts to other domains such as webpages. Pretrained multi-modal large language models (MLLMs) have achieved impressive performance across a wide range of multimodal tasks (Liu et al., 2023c; Bai et al., 2025; Wang et al., 2025a; Agrawal et al., 2024; OpenAI et al., 2024; Comanici et al., 2025; Anthropic, 2024), yet advanced reasoning capabilities remain underdeveloped, especially in domains where user reasoning-intensive query-answer data is difficult to collect. In this work, we consider reasoning capability over artificial image domains, including charts, tables, information graphs, rendered documents, webpages, etc. While such images are abundant on the web, datasets containing reasoning questions over them are scarce.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.1504

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA

Neural Information Processing SystemsOct-8-2025, 21:40:09 GMT

However, these messages are past-oriented, and they do not consider the full KG context.

machine learning, natural language, node, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition

Liu, Yi, Zhu, Xiangrong, Liu, Xiangyu, Wei, Wei, Hu, Wei

arXiv.org Artificial IntelligenceSep-10-2025

In a rapidly evolving world where information updates swiftly, knowledge in large language models (LLMs) becomes outdated quickly. Retraining LLMs is not a cost-effective option, making knowledge editing (KE) without modifying parameters particularly necessary. We find that although existing retrieval-augmented generation (RAG)-based KE methods excel at editing simple knowledge, they struggle with KE in multi-hop question answering due to the issue of "edit skipping", which refers to skipping the relevant edited fact in inference. In addition to the diversity of natural language expressions of knowledge, edit skipping also arises from the mismatch between the granularity of LLMs in problem-solving and the facts in the edited memory. To address this issue, we propose a novel Iterative Retrieval-Augmented Knowledge Editing method with guided decomposition (IRAKE) through the guidance from single edited facts and entire edited cases. Experimental results demonstrate that IRAKE mitigates the failure of editing caused by edit skipping and outperforms state-of-the-art methods for KE in multi-hop question answering.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.07555

Country:

North America > United States (0.68)
Asia (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework

Liang, Zucheng, Wei, Wenxin, Zhang, Kaijie, Chen, Hongyi

arXiv.org Artificial IntelligenceSep-8-2025

Accurately answering complex questions has consistently been a significant challenge for Large Language Models (LLMs). To address this, this paper proposes a multi-hop question decomposition method for complex questions, building upon research within the MQUAKE framework. Utilizing the LLAMA3 model, we systematically investigate the impact of multi-hop question decomposition within knowledge graphs on model comprehension and reasoning accuracy, both before and after model training. In our experiments, we systematically partitioned and converted the MQUAKE-T dataset into two distinct formats: a single-hop dataset designed for directly answering complex questions, and a multi-hop dataset constructed using the multi-hop question decomposition method. We then fine-tuned the LLAMA3 model on these datasets and conducted inference tests. Our results demonstrate that, without fine-tuning the LLM, the prediction performance based on the multi-hop question decomposition method significantly outperforms the method of directly answering complex questions. After fine-tuning using the LoRA (Low-Rank Adaptation) method, the performance of both approaches improved compared to the untrained baseline. Crucially, the method utilizing multi-hop decomposition consistently maintained its superiority. These findings validate the effectiveness of the multi-hop decomposition method both before and after training, demonstrating its capability to effectively enhance the LLM's ability to answer complex questions.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.0477

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Add feedback

MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents

Wolfson, Tomer, Trivedi, Harsh, Geva, Mor, Goldberg, Yoav, Roth, Dan, Khot, Tushar, Sabharwal, Ashish, Tsarfaty, Reut

arXiv.org Artificial IntelligenceSep-4-2025

Automated agents, powered by Large language models (LLMs), are emerging as the go-to tool for querying information. However, evaluation benchmarks for LLM agents rarely feature natural questions that are both information-seeking and genuinely time-consuming for humans. To address this gap we introduce MoNaCo, a benchmark of 1,315 natural and time-consuming questions that require dozens, and at times hundreds, of intermediate steps to solve -- far more than any existing QA benchmark. To build MoNaCo, we developed a decomposed annotation pipeline to elicit and manually answer real-world time-consuming questions at scale. Frontier LLMs evaluated on MoNaCo achieve at most 61.2% F1, hampered by low recall and hallucinations. Our results underscore the limitations of LLM-powered agents in handling the complexity and sheer breadth of real-world information-seeking tasks -- with MoNaCo providing an effective resource for tracking such progress. The MoNaCo benchmark, codebase, prompts and models predictions are all publicly available at: https://tomerwolgithub.github.io/monaco

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.11133

Country:

Europe > Monaco (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (1.00)
Media (0.93)
Leisure & Entertainment (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding

Liu, Shenxi, Li, Kan, Zhao, Mingyang, Tian, Yuhang, Li, Bin, Zhou, Shoujun, Li, Hongliang, Yang, Fuxia

arXiv.org Artificial IntelligenceJul-8-2025

With the rapid progress of artificial intelligence (AI) in multi-modal understanding, there is increasing potential for video comprehension technologies to support professional domains such as medical education. However, existing benchmarks suffer from two primary limitations: (1) Linguistic Singularity: they are largely confined to English, neglecting the need for multilingual resources; and (2) Shallow Reasoning: their questions are often designed for surface-level information retrieval, failing to properly assess deep multi-modal integration. To address these limitations, we present M3-Med, the first benchmark for Multi-lingual, Multi-modal, and Multi-hop reasoning in Medical instructional video understanding. M3-Med consists of medical questions paired with corresponding video segments, annotated by a team of medical experts. A key innovation of M3-Med is its multi-hop reasoning task, which requires a model to first locate a key entity in the text, then find corresponding visual evidence in the video, and finally synthesize information across both modalities to derive the answer. This design moves beyond simple text matching and poses a substantial challenge to a model's deep cross-modal understanding capabilities. We define two tasks: Temporal Answer Grounding in Single Video (TAGSV) and Temporal Answer Grounding in Video Corpus (TAGVC). We evaluated several state-of-the-art models and Large Language Models (LLMs) on M3-Med. The results reveal a significant performance gap between all models and human experts, especially on the complex multi-hop questions where model performance drops sharply. M3-Med effectively highlights the current limitations of AI models in deep cross-modal reasoning within specialized domains and provides a new direction for future research.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.04289

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Switzerland (0.04)
Asia > China > Hong Kong (0.04)
(9 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.57)

Industry:

Education > Educational Technology (0.57)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering

Ji, Binquan, Luo, Haibo, Lu, Yifei, Hei, Lei, Wang, Jiaqi, Liao, Tingjing, Wang, Lingyu, Wang, Shichao, Ren, Feiliang

arXiv.org Artificial IntelligenceJun-24-2025

Knowledge-intensive multi-hop question answering (QA) tasks, which require integrating evidence from multiple sources to address complex queries, often necessitate multiple rounds of retrieval and iterative generation by large language models (LLMs). However, incorporating many documents and extended contexts poses challenges -such as hallucinations and semantic drift-for lightweight LLMs with fewer parameters. This work proposes a novel framework called DEC (Dynamic Enhancement Chain). DEC first decomposes complex questions into logically coherent subquestions to form a hallucination-free reasoning chain. It then iteratively refines these subquestions through context-aware rewriting to generate effective query formulations. For retrieval, we introduce a lightweight discriminative keyword extraction module that leverages extracted keywords to achieve targeted, precise document recall with relatively low computational overhead. Extensive experiments on three multi-hop QA datasets demonstrate that DEC performs on par with or surpasses state-of-the-art benchmarks while significantly reducing token consumption. Notably, our approach attains state-of-the-art results on models with 8B parameters, showcasing its effectiveness in various scenarios, particularly in resource-constrained environments.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2506.17692

Country:

Asia (0.93)
Europe (0.93)
North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

COMPKE: Complex Question Answering under Knowledge Editing

Cheng, Keyuan, Kan, Zijian, He, Zhixian, Zhang, Zhuoran, Ali, Muhammad Asif, Xu, Ke, Hu, Lijie, Wang, Di

arXiv.org Artificial IntelligenceJun-4-2025

Knowledge Editing, which efficiently modifies the knowledge in large language models, has gathered great attention. Current benchmarks primarily use multi-hop question answering to assess and analyze newly injected or updated knowledge. However, we argue that these benchmarks fail to effectively evaluate how well the updated models apply this knowledge in real-life scenarios, particularly when questions require complex reasoning, involving one-to-many relationships or multi-step logical intersections. To fill in this gap, we introduce a new benchmark, COMPKE: Complex Question Answering under Knowledge Editing, which includes 11,924 complex questions that reflect real-life situations. We conduct an extensive evaluation of four knowledge editing methods on COMPKE, revealing that their effectiveness varies notably across different models. For instance, MeLLo attains an accuracy of 39.47 on GPT-4O-MINI, but this drops sharply to 3.83 on QWEN2.5-3B. We further investigate the underlying causes of these disparities from both methodological and model-specific perspectives. The datasets are available at https://github.com/kzjkzj666/CompKE.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.00829

Country: North America > United States > California (0.17)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.94)
Health & Medicine (0.68)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Filters

Collaborating Authors

complex question

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

707a2d58641b2192203b4bf4c532cfe1-Paper-Conference.pdf

KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA

Composition-Grounded Instruction Synthesis for Visual Reasoning

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA

Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition

Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework

MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents

M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding

Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering

COMPKE: Complex Question Answering under Knowledge Editing