TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives Maitreya Patel

Oct-9-2025, 23:36:42 GMT–Neural Information Processing Systems

Contrastive Language-Image Pretraining (CLIP) models maximize the mutual information between textual and visual modalities to learn representations.

caption, dataset, tripletclip, (12 more...)

Neural Information Processing Systems

Oct-9-2025, 23:36:42 GMT

Conferences PDF

Country:
- North America > United States
  - Arizona (0.04)
  - Maryland
    - Baltimore County (0.04)
    - Baltimore (0.04)
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- Africa > Central African Republic
  - Ombella-M'Poko > Bimbo (0.04)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.93)

Industry:
- Media (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Machine Learning > Neural Networks (0.67)
    - Natural Language
      - Large Language Model (0.70)
      - Text Processing (0.68)

Duplicate Docs Excel Report

Title
39781da4b5d05bc2908ce08e43bc6404-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found