Goto

Collaborating Authors

 Problem Solving


Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering

arXiv.org Artificial Intelligence

Multimodal Large Language Models (MLLMs) have demonstrated remarkable multimodal understanding capabilities in Visual Question Answering (VQA) tasks by integrating visual and textual features. However, under the challenging ten-choice question evaluation paradigm, existing methods still exhibit significant limitations when processing PDF documents with complex layouts and lengthy content. Notably, current mainstream models suffer from a strong bias toward English training data, resulting in suboptimal performance for Japanese and other language scenarios. To address these challenges, this paper proposes a novel Japanese PDF document understanding framework that combines multimodal hierarchical reasoning mechanisms with Colqwen-optimized retrieval methods, while innovatively introducing a semantic verification strategy through sub-question decomposition. Experimental results demonstrate that our framework not only significantly enhances the model's deep semantic parsing capability for complex documents, but also exhibits superior robustness in practical application scenarios.


Information Ecosystem Reengineering via Public Sector Knowledge Representation

arXiv.org Artificial Intelligence

Information Ecosystem Reengineering (IER) -- the technological reconditioning of information sources, services, and systems within a complex information ecosystem -- is a foundational challenge in the digital transformation of public sector services and smart governance platforms. From a semantic knowledge management perspective, IER becomes especially entangled due to the potentially infinite number of possibilities in its conceptualization, namely, as a result of manifoldness in the multi-level mix of perception, language and conceptual interlinkage implicit in all agents involved in such an effort. This paper proposes a novel approach -- Representation Disentanglement -- to disentangle these multiple layers of knowledge representation complexity hindering effective reengineering decision making. The approach is based on the theoretically grounded and implementationally robust ontology-driven conceptual modeling paradigm which has been widely adopted in systems analysis and (re)engineering. We argue that such a framework is essential to achieve explainability, traceability and semantic transparency in public sector knowledge representation and to support auditable decision workflows in governance ecosystems increasingly driven by Artificial Intelligence (AI) and data-centric architectures.


Beyond Individuals: Collective Predictive Coding for Memory, Attention, and the Emergence of Language

arXiv.org Artificial Intelligence

This commentary extends the discussion by Parr et al. on memory and attention beyond individual cognitive systems. From the perspective of the Collective Predictive Coding (CPC) hypothesis -- a framework for understanding these faculties and the emergence of language at the group level -- we introduce a hypothetical idea: that language, with its embedded distributional semantics, serves as a collectively formed external representation. CPC generalises the concepts of individual memory and attention to the collective level. This offers a new perspective on how shared linguistic structures, which may embrace collective world models learned through next-word prediction, emerge from and shape group-level cognition.


CIA+TA Risk Assessment for AI Reasoning Vulnerabilities

arXiv.org Artificial Intelligence

As AI systems increasingly influence critical decisions, they face threats that exploit reasoning mechanisms rather than technical infrastructure. We present a framework for cognitive cybersecurity, a systematic protection of AI reasoning processes from adversarial manipulation. Our contributions are threefold. First, we establish cognitive cybersecurity as a discipline complementing traditional cybersecurity and AI safety, addressing vulnerabilities where legitimate inputs corrupt reasoning while evading conventional controls. Second, we introduce the CIA+TA, extending traditional Confidentiality, Integrity, and Availability triad with Trust (epistemic validation) and Autonomy (human agency preservation), requirements unique to systems generating knowledge claims and mediating decisions. Third, we present a quantitative risk assessment methodology with empirically-derived coefficients, enabling organizations to measure cognitive security risks. We map our framework to OWASP LLM Top 10 and MITRE ATLAS, facilitating operational integration. Validation through previously published studies (151 human participants; 12,180 AI trials) reveals strong architecture dependence: identical defenses produce effects ranging from 96% reduction to 135% amplification of vulnerabilities. This necessitates pre-deployment Cognitive Penetration Testing as a governance requirement for trustworthy AI deployment.


DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models

arXiv.org Artificial Intelligence

While Large Reasoning Models (LRMs) have demonstrated success in complex reasoning tasks through long chain-of-thought (CoT) reasoning, their inference often involves excessively verbose reasoning traces, resulting in substantial inefficiency. To address this, we propose Distilled Reasoning Pruning (DRP), a hybrid framework that combines inference-time pruning with tuning-based distillation, two widely used strategies for efficient reasoning. DRP uses a teacher model to perform skill-aware step decomposition and content pruning, and then distills the pruned reasoning paths into a student model, enabling it to reason both efficiently and accurately. Across several challenging mathematical reasoning datasets, we find that models trained with DRP achieve substantial improvements in token efficiency without sacrificing accuracy. Specifically, DRP reduces average token usage on GSM8K from 917 to 328 while improving accuracy from 91.7% to 94.1%, and achieves a 43% token reduction on AIME with no performance drop. Further analysis shows that aligning the reasoning structure of training CoTs with the student's reasoning capacity is critical for effective knowledge transfer and performance gains.


Supplementary Materials A Constraint Explanation for Problem 2 Constraint 2b checks the loss of each sample and the derivation is shown as follows

Neural Information Processing Systems

Constraint 2b checks the loss of each sample and the derivation is shown as follows. Constraint 2d to 2i are mainly adopted from Bertsimas and Dunn [2019] on Chapter 8.2. Here we briefly explain the meaning and derivations of these constraints. Constraint 2g and 2h are set to ensure that if there's no split on Constraint 2i enforces the hierarchical structure of the tree. Algorithm 1 depicts the details of the Branch-and-bound scheme for training the optimal decision tree.


Computational Resolution of Hadamard Product Factorization for $4 \times 4$ Matrices

arXiv.org Artificial Intelligence

We computationally resolve an open problem concerning the expressibility of $4 \times 4$ full-rank matrices as Hadamard products of two rank-2 matrices. Through exhaustive search over $\mathbb{F}_2$, we identify 5,304 counterexamples among the 20,160 full-rank binary matrices (26.3\%). We verify that these counterexamples remain valid over $\mathbb{Z}$ through sign enumeration and provide strong numerical evidence for their validity over $\mathbb{R}$. Remarkably, our analysis reveals that matrix density (number of ones) is highly predictive of expressibility, achieving 95.7\% classification accuracy. Using modern machine learning techniques, we discover that expressible matrices lie on an approximately 10-dimensional variety within the 16-dimensional ambient space, despite the naive parameter count of 24 (12 parameters each for two $4 \times 4$ rank-2 matrices). This emergent low-dimensional structure suggests deep algebraic constraints governing Hadamard factorizability.


Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have made significant strides in reasoning tasks through methods like chain-of-thought (CoT) reasoning. However, they often fall short in tasks requiring precise computations. Tool-Integrated Reasoning (TIR) has emerged as a solution by incorporating external tools into the reasoning process. Nevertheless, the generalization of TIR in improving the reasoning ability of LLM is still unclear. Additionally, whether TIR has improved the model's reasoning behavior and helped the model think remains to be studied. We introduce ReasonZoo, a comprehensive benchmark encompassing nine diverse reasoning categories, to evaluate the effectiveness of TIR across various domains. Additionally, we propose two novel metrics, Performance-Aware Cost (PAC) and Area Under the Performance-Cost Curve (AUC-PCC), to assess reasoning efficiency. Our empirical evaluation demonstrates that TIR-enabled models consistently outperform their non-TIR counterparts in both mathematical and non-mathematical tasks. Furthermore, TIR enhances reasoning efficiency, as evidenced by improved PAC and AUC-PCC, indicating reduced overthinking and more streamlined reasoning. These findings underscore the domain-general benefits of TIR and its potential to advance LLM capabilities in complex reasoning tasks.


End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

arXiv.org Artificial Intelligence

Accurate diagnosis with medical large language models is hindered by knowledge gaps and hallucinations. Retrieval and tool-augmented methods help, but their impact is limited by weak use of external knowledge and poor feedback-reasoning traceability. To address these challenges, We introduce Deep-DxSearch, an agentic RAG system trained end-to-end with reinforcement learning (RL) that enables steer tracebale retrieval-augmented reasoning for medical diagnosis. In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources to support retrieval-aware reasoning across diagnostic scenarios. More crutially, we frame the LLM as the core agent and the retrieval corpus as its environment, using tailored rewards on format, retrieval, reasoning structure, and diagnostic accuracy, thereby evolving the agentic RAG policy from large-scale data through RL. Experiments demonstrate that our end-to-end agentic RL training framework consistently outperforms prompt-engineering and training-free RAG approaches across multiple data centers. After training, Deep-DxSearch achieves substantial gains in diagnostic accuracy, surpassing strong diagnostic baselines such as GPT-4o, DeepSeek-R1, and other medical-specific frameworks for both common and rare disease diagnosis under in-distribution and out-of-distribution settings. Moreover, ablation studies on reward design and retrieval corpus components confirm their critical roles, underscoring the uniqueness and effectiveness of our approach compared with traditional implementations. Finally, case studies and interpretability analyses highlight improvements in Deep-DxSearch's diagnostic policy, providing deeper insight into its performance gains and supporting clinicians in delivering more reliable and precise preliminary diagnoses. See https://github.com/MAGIC-AI4Med/Deep-DxSearch.


HHNAS-AM: Hierarchical Hybrid Neural Architecture Search using Adaptive Mutation Policies

arXiv.org Artificial Intelligence

Neural Architecture Search (NAS) has garnered significant research interest due to its capability to discover architectures superior to manually designed ones. Learning text representation is crucial for text classification and other language-related tasks. The NAS model used in text classification does not have a Hybrid hierarchical structure, and there is no restriction on the architecture structure, due to which the search space becomes very large and mostly redundant, so the existing RL models are not able to navigate the search space effectively. Also, doing a flat architecture search leads to an unorganised search space, which is difficult to traverse. For this purpose, we propose HHNAS-AM (Hierarchical Hybrid Neural Architecture Search with Adaptive Mutation Policies), a novel approach that efficiently explores diverse architectural configurations. We introduce a few architectural templates to search on which organise the search spaces, where search spaces are designed on the basis of domain-specific cues. Our method employs mutation strategies that dynamically adapt based on performance feedback from previous iterations using Q-learning, enabling a more effective and accelerated traversal of the search space. The proposed model is fully probabilistic, enabling effective exploration of the search space. We evaluate our approach on the database id (db_id) prediction task, where it consistently discovers high-performing architectures across multiple experiments. On the Spider dataset, our method achieves an 8% improvement in test accuracy over existing baselines.