Training-Free Dual Hyperbolic Adapters for Better Cross-Modal Reasoning
Zhang, Yi, Cheng, Chun-Wun, He, Junyi, Yu, Ke, Tang, Yushun, Schönlieb, Carola-Bibiane, He, Zhihai, Aviles-Rivero, Angelica I.
–arXiv.org Artificial Intelligence
Abstract--Recent research in Vision-Language Models (VLMs) has significantly advanced our capabilities in cross-modal reasoning. However, existing methods suffer from performance degradation with domain changes or require substantial computational resources for fine-tuning in new domains. T o address this issue, we develop a new adaptation method for large vision-language models, called Training-free Dual Hyperbolic Adapters (T -DHA). We characterize vision-language relationship between semantic concepts, which typically has a hierarchical tree structure, in the hyperbolic space instead of the traditional Euclidean space. We find that this unique property is particularly effective for embedding hierarchical data structures using the Poincar e ball model, achieving significantly improved representation and discrimination power . Coupled with negative learning, it provides more accurate and robust classifications with fewer feature dimensions. Our extensive experimental results on various datasets demonstrate that the T -DHA method significantly outperforms existing state-of-the-art methods in few-shot image recognition and domain generalization tasks. ARGE Vision-Language Models (VLMs), such as CLIP [1] and ALIGN [2], are trained on extensive image-text datasets using contrastive learning. These models excel in creating a unified vision-language embedding space by aligning visual and textual modalities, enabling their successful application across a wide range of downstream visual tasks, such as few-shot image recognition [3]-[5].
arXiv.org Artificial Intelligence
Dec-10-2025
- Country:
- Asia > China
- Beijing > Beijing (0.04)
- Guangdong Province > Shenzhen (0.04)
- Europe
- Switzerland > Zürich
- Zürich (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Switzerland > Zürich
- North America
- Canada > Newfoundland and Labrador
- Labrador (0.05)
- United States > Maine (0.04)
- Canada > Newfoundland and Labrador
- Asia > China
- Genre:
- Research Report
- New Finding (0.68)
- Promising Solution (0.66)
- Research Report
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks (0.46)
- Pattern Recognition (0.55)
- Statistical Learning (0.46)
- Natural Language
- Large Language Model (0.48)
- Text Processing (0.67)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology