Goto

Collaborating Authors

 inference strategy






Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model

arXiv.org Artificial Intelligence

Abstract--Class-incremental learning requires a learning system to continually learn knowledge of new classes and meanwhile try to preserve previously learned knowledge of old classes. As current state-of-the-art methods based on Vision-Language Models (VLMs) still suffer from the issue of differentiating classes across learning tasks. Here a novel VLM-based continual learning framework for image classification is proposed. In this framework, task-specific adapters are added to the pre-trained and frozen image encoder to learn new knowledge, and a novel cross-task representation calibration strategy based on a mixture of light-weight projectors is used to help better separate all learned classes in a unified feature space, alleviating class confusion across tasks. In addition, a novel inference strategy guided by prediction uncertainty is developed to more accurately select the most appropriate image feature for class prediction. Extensive experiments on multiple datasets under various settings demonstrate the superior performance of our method compared to existing ones.


HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning

arXiv.org Artificial Intelligence

We introduce HEAD-QA v2, an expanded and updated version of a Spanish/English healthcare multiple-choice reasoning dataset originally released by Vilares and Gรณmez-Rodrรญguez (2019). The update responds to the growing need for high-quality datasets that capture the linguistic and conceptual complexity of healthcare reasoning. We extend the dataset to over 12,000 questions from ten years of Spanish professional exams, benchmark several open-source LLMs using prompting, RAG, and probability-based answer selection, and provide additional multilingual versions to support future work. Results indicate that performance is mainly driven by model scale and intrinsic reasoning ability, with complex inference strategies obtaining limited gains. Together, these results establish HEAD-QA v2 as a reliable resource for advancing research on biomedical reasoning and model improvement.


Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s

arXiv.org Artificial Intelligence

Abstract--As local AI grows in popularity, there is a critical gap between the benchmark performance of object detectors and their practical viability on consumer-grade hardware. While models like YOLOv10s promise real-time speeds, these metrics are typically achieved on high-power, desktop-class GPUs. This paper reveals that on resource-constrained systems, such as laptops with RTX 4060 GPUs, performance is not compute-bound but is instead dominated by system-level bottlenecks, as illustrated by a simple bottleneck test. T o overcome this hardware-level constraint, we introduce a Two-Pass Adaptive Inference algorithm, a model-independent approach that requires no architectural changes. This study mainly focuses on'adaptive' inference strategies and undertakes a comparative analysis of architectural early-exit and resolution-adaptive routing, highlighting their respective trade-offs within a unified evaluation framework. The system uses a fast, low-resolution pass and only escalates to a high-resolution model pass when detection confidence is low. On a 5000-image COCO dataset, our method achieves a 1.85x speedup over a PyT orch Early-Exit baseline, with a modest mAP loss of 5.51%. This work provides a practical and reproducible blueprint for deploying high-performance, real-time AI on consumer-grade devices by shifting the focus from pure model optimization to hardware-aware inference strategies that maximize throughput.


Neurophysiological Characteristics of Adaptive Reasoning for Creative Problem-Solving Strategy

arXiv.org Artificial Intelligence

Adaptive reasoning enables humans to flexibly adjust inference strategies when environmental rules or contexts change, yet its underlying neural dynamics remain unclear. This study investigated the neurophysiological mechanisms of adaptive reasoning using a card-sorting paradigm combined with electroencephalography and compared human performance with that of a multimodal large language model. Stimulus- and feedback-locked analyses revealed coordinated delta-theta-alpha dynamics: early delta-theta activity reflected exploratory monitoring and rule inference, whereas occipital alpha engagement indicated confirmatory stabilization of attention after successful rule identification. In contrast, the multimodal large language model exhibited only short-term feedback-driven adjustments without hierarchical rule abstraction or genuine adaptive reasoning. These findings identify the neural signatures of human adaptive reasoning and highlight the need for brain-inspired artificial intelligence that incorporates oscillatory feedback coordination for true context-sensitive adaptation.


DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference

arXiv.org Artificial Intelligence

Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high performance without overthinking. First, we analyze the entropy of token probabilities in reasoning traces. Across three models, we observe a consistent U-shaped entropy pattern: high entropy on easy problems despite high accuracy, low entropy on problems with medium difficulty, and high entropy on hard problems reflecting uncertainty. Specifically, we notice 22--25\% entropy reduction from easy to medium difficulty regions, suggesting an {overthinking} phenomenon on easy instances. Building on these insights, we introduce \textbf{DiffAdapt}, a lightweight framework that selects Easy/Normal/Hard inference strategies per question based on their difficulty and reasoning trace entropy. Each inference strategy consists of a fixed prompt, temperature and maximum token length. In contrast to existing efficiency optimization methods, our approach does not fine-tune base LLM but a small probe that classifies LLM's final hidden state, allowing inexpensive adaptation. We comprehensively evaluate our method on five models and eight benchmarks. Our method achieves comparable or improved accuracy while reducing token usage by up to 22.4\%, establishing a practical path toward compute-efficient reasoning.


RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models

arXiv.org Artificial Intelligence

Nowadays, Large Language Models (LLMs) are able to propose rules in natural language, overcoming constrains of a predefined predicate space inherent in traditional rule learning. However, existing methods using LLMs often overlook the combination effects of rules, and the potential of coupling LLMs with probabilistic rule learning to ensure robust inference is not fully explored. To address this gap, we introduce RLIE, a unified framework that integrates LLMs with probabilistic modeling to learn a set of probabilistic rules. The RLIE framework comprises four stages: (1) Rule generation, where a LLM proposes and filters candidate rules; (2) Logistic regression, which learns the probabilistic weights of the rules for global selection and calibration; (3) Iterative refinement, which continuously optimizes the rule set based on prediction errors; and (4) Evaluation, which compares the performance of the weighted rule set as a direct classifier against various methods of injecting the rules into an LLM. Generated rules are the evaluated with different inference strategies on multiple real-world datasets. While applying rules directly with corresponding weights brings us superior performance, prompting LLMs with rules, weights and classification results from the logistic model will surprising degrade the performance. This result aligns with the observation that LLMs excel at semantic generation and interpretation but are less reliable at fine-grained, controlled probabilistic integration. Our work investigates the potentials and limitations of using LLMs for inductive reasoning tasks, proposing a unified framework which integrates LLMs with classic probabilistic rule combination methods, paving the way for more reliable neuro-symbolic reasoning systems. In data-driven applications and scientific discovery, the goal is not merely to predict outcomes, but to construct a set of verifiable, reusable, and composable theories(Zhou et al., 2024; Y ang et al., 2024a; Minh et al., 2022). These theories can enable explainable, auditable decisions while driving the discovery of new knowledge and underlying structures(Y ang et al., 2023; 2024b). These theories can be expressed in formal, structural statements(Cohen et al., 1995; Cropper & Morel, 2021) or natural language hypotheses(Zhou et al., 2024), and they share a common characteristic: they are declarative, testable, and self-contained discriminative patterns that yield predictions verifiable by external evidence In this paper, we do not distinguish between the terms "rule" and "hypothesis", and will use "rule" throughout the text for consistency.