Adapter-state Sharing CLIP for Parameter-efficient Multimodal Sarcasm Detection

Jana, Soumyadeep, Danayak, Sahil, Singh, Sanasam Ranbir

Oct-30-2025–arXiv.org Artificial Intelligence

ABSTRACT The growing prevalence of multimodal image-text sarcasm on social media poses challenges for opinion mining systems. Existing approaches rely on full fine-tuning of large models, making them unsuitable to adapt under resource-constrained settings. While recent parameter-efficient fine-tuning (PEFT) methods offer promise, their off-the-shelf use underperforms on complex tasks like sarcasm detection. We propose AdS-CLIP (Adapter-state Sharing in CLIP), a lightweight framework built on CLIP that inserts adapters only in the upper layers to preserve low-level unimodal representations in the lower layers and introduces a novel adapter-state sharing mechanism, where textual adapters guide visual ones to promote efficient cross-modal learning in the upper layers. Experiments on two public benchmarks demonstrate that AdS-CLIP not only outperforms standard PEFT methods but also existing multimodal baselines with significantly fewer trainable parameters.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-30-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.68)
- Asia > Middle East (0.28)

Genre:
- Research Report (0.51)

Technology:
- Information Technology
  - Communications (1.00)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Natural Language
      - Information Extraction (0.49)
      - Discourse & Dialogue (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found