CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
–arXiv.org Artificial Intelligence
Verifying the authenticity of AI-generated images presents a growing challenge on social media platforms these days. While vision-language models (VLMs) like CLIP outdo in multimodal representation, their capacity for AI-generated image classification is underexplored due to the absence of such labels during the pre-training process. This work investigates whether CLIP embeddings inherently contain information indicative of AI generation. A proposed pipeline extracts visual embeddings using a frozen CLIP model, feeds its embeddings to lightweight networks, and fine-tunes only the final classifier. Experiments on the public CIFAKE benchmark show the performance reaches 95% accuracy without language reasoning. Few-shot adaptation to curated custom with 20% of the data results in performance to 85%. A closed-source baseline (Gemini-2.0) has the best zero-shot accuracy yet fails on specific styles. Notably, some specific image types, such as wide-angle photographs and oil paintings, pose significant challenges to classification. These results indicate previously unexplored difficulties in classifying certain types of AI-generated images, revealing new and more specific questions in this domain that are worth further investigation.
arXiv.org Artificial Intelligence
May-19-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Republic of Türkiye > Karaman Province
- Karaman (0.04)
- North America > United States
- New York > Monroe County > Rochester (0.04)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.47)
- Industry:
- Information Technology > Security & Privacy (0.32)
- Technology: