CroMe: Multimodal Fake News Detection using Cross-Modal Tri-Transformer and Metric Learning
Choi, Eunjee, Ahn, Junhyun, Piao, XinYu, Kim, Jong-Kook
–arXiv.org Artificial Intelligence
Multimodal Fake News Detection has received increasing attention recently. Existing methods rely on independently encoded unimodal data and overlook the advantages of capturing intra-modality relationships and integrating inter-modal similarities using advanced techniques. To address these issues, Cross-Modal Tri-Transformer and Metric Learning for Multimodal Fake News Detection (CroMe) is proposed. CroMe utilizes Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (BLIP2) as encoders to capture detailed text, image and combined image-text representations. The metric learning module employs a proxy anchor method to capture intra-modality relationships while the feature fusion module uses a Cross-Modal and Tri-Transformer for effective integration. The final fake news detector processes the fused features through a classifier to predict the authenticity of the content. Experiments on datasets show that CroMe excels in multimodal fake news detection.
arXiv.org Artificial Intelligence
Jan-21-2025
- Country:
- Oceania > Australia
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.04)
- Europe
- Greece (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Asia > China
- Hong Kong (0.04)
- Genre:
- Research Report (0.50)
- Technology: