Adapter-state Sharing CLIP for Parameter-efficient Multimodal Sarcasm Detection
Jana, Soumyadeep, Danayak, Sahil, Singh, Sanasam Ranbir
–arXiv.org Artificial Intelligence
ABSTRACT The growing prevalence of multimodal image-text sarcasm on social media poses challenges for opinion mining systems. Existing approaches rely on full fine-tuning of large models, making them unsuitable to adapt under resource-constrained settings. While recent parameter-efficient fine-tuning (PEFT) methods offer promise, their off-the-shelf use underperforms on complex tasks like sarcasm detection. We propose AdS-CLIP (Adapter-state Sharing in CLIP), a lightweight framework built on CLIP that inserts adapters only in the upper layers to preserve low-level unimodal representations in the lower layers and introduces a novel adapter-state sharing mechanism, where textual adapters guide visual ones to promote efficient cross-modal learning in the upper layers. Experiments on two public benchmarks demonstrate that AdS-CLIP not only outperforms standard PEFT methods but also existing multimodal baselines with significantly fewer trainable parameters.
arXiv.org Artificial Intelligence
Oct-30-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- India > Assam
- Guwahati (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Europe
- Iceland > Capital Region
- Reykjavik (0.04)
- Italy > Tuscany
- Florence (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Iceland > Capital Region
- North America > Canada
- Asia
- Genre:
- Research Report (0.51)
- Technology: