Training-Free Dual Hyperbolic Adapters for Better Cross-Modal Reasoning

Zhang, Yi, Cheng, Chun-Wun, He, Junyi, Yu, Ke, Tang, Yushun, Schönlieb, Carola-Bibiane, He, Zhihai, Aviles-Rivero, Angelica I.

Dec-10-2025–arXiv.org Artificial Intelligence

Abstract--Recent research in Vision-Language Models (VLMs) has significantly advanced our capabilities in cross-modal reasoning. However, existing methods suffer from performance degradation with domain changes or require substantial computational resources for fine-tuning in new domains. T o address this issue, we develop a new adaptation method for large vision-language models, called Training-free Dual Hyperbolic Adapters (T -DHA). We characterize vision-language relationship between semantic concepts, which typically has a hierarchical tree structure, in the hyperbolic space instead of the traditional Euclidean space. We find that this unique property is particularly effective for embedding hierarchical data structures using the Poincar e ball model, achieving significantly improved representation and discrimination power . Coupled with negative learning, it provides more accurate and robust classifications with fewer feature dimensions. Our extensive experimental results on various datasets demonstrate that the T -DHA method significantly outperforms existing state-of-the-art methods in few-shot image recognition and domain generalization tasks. ARGE Vision-Language Models (VLMs), such as CLIP [1] and ALIGN [2], are trained on extensive image-text datasets using contrastive learning. These models excel in creating a unified vision-language embedding space by aligning visual and textual modalities, enabling their successful application across a wide range of downstream visual tasks, such as few-shot image recognition [3]-[5].

large language model, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

Dec-10-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- Europe > Switzerland (0.28)

Genre:
- Research Report
  - New Finding (0.68)
  - Promising Solution (0.66)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language
      - Text Processing (0.67)
      - Large Language Model (0.48)
    - Machine Learning
      - Pattern Recognition (0.55)
      - Neural Networks (0.46)
      - Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found