GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

Li, Xin, Lian, Dongze, Lu, Zhihe, Bai, Jiawang, Chen, Zhibo, Wang, Xinchao

Sep-24-2023–arXiv.org Artificial Intelligence

Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) under the low-data regime, where only a few additional parameters are introduced to excavate the task-specific knowledge based on the general and powerful representation of VLMs. However, most adapter-style works face two limitations: (i) modeling task-specific knowledge with a single modality only; and (ii) overlooking the exploitation of the inter-class relationships in downstream tasks, thereby leading to sub-optimal solutions. To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively. This enables the textual feature of each prompt to leverage the task-specific structure knowledge from both textual and visual modalities, yielding a more effective classifier for downstream tasks. Extensive experimental results on 11 benchmark datasets reveal that our GraphAdapter significantly outperforms previous adapter-based methods. The code will be released at https://github.com/lixinustc/GraphAdapter

graphadapter, knowledge, structure knowledge, (12 more...)

arXiv.org Artificial Intelligence

Sep-24-2023

arXiv.org PDF

Add feedback

Country:
- Europe
  - Switzerland > Zürich
    - Zürich (0.14)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
- Asia
  - Singapore (0.04)
  - China (0.04)
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (1.00)
  - Representation & Reasoning > Semantic Networks (0.83)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found