Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

Zeng, Fengzhu, Li, Wenqian, Gao, Wei, Pang, Yan

Sep-29-2024–arXiv.org Artificial Intelligence

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

dataset, detection, misinformation, (15 more...)

arXiv.org Artificial Intelligence

Sep-29-2024

arXiv.org PDF

Add feedback

Country:
- South America > Peru (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Virginia > Richmond (0.04)
    - Arizona (0.04)
    - Tennessee (0.04)
    - District of Columbia > Washington (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Mississippi > Hinds County
      - Jackson (0.04)
    - Washington > King County
      - Seattle (0.04)
    - California
      - Santa Clara County > Mountain View (0.04)
      - Los Angeles County > Los Angeles
        Hollywood > West Hollywood (0.04)
    - New York > New York County
      - New York City (0.05)
  - Canada
    - Ontario > Toronto (0.04)
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Russia (0.04)
  - Netherlands (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - France
    - Île-de-France > Paris
      - Paris (0.14)
    - Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
      - Marseille (0.04)
- Asia
  - Singapore (0.04)
  - China > Hong Kong (0.04)
  - Russia (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report (0.82)

Industry:
- Media > News (1.00)
- Leisure & Entertainment > Sports
  - Football (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology
  - Communications > Social Media (0.93)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language > Large Language Model (1.00)
    - Vision (0.94)
    - Machine Learning > Neural Networks
      - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found