On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection

Mar-27-2025, 11:27:56 GMT–Neural Information Processing Systems

Large numbers of synthesized videos from diffusion models pose threats to information security and authenticity, leading to an increasing demand for generated content detection. However, existing video-level detection algorithms primarily focus on detecting facial forgeries and often fail to identify diffusion-generated content with a diverse range of semantics. To advance the field of video forensics, we propose an innovative algorithm named Multi-Modal Detection(MM-Det) for detecting diffusion-generated videos. MM-Det utilizes the profound perceptual and comprehensive abilities of Large Multi-modal Models (LMMs) by generating a Multi-Modal Forgery Representation (MMFR) from LMM's multi-modal space, enhancing its ability to detect unseen forgery content.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Mar-27-2025, 11:27:56 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.14)
- North America > Canada (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks (1.00)
      - Performance Analysis > Accuracy (0.46)
    - Natural Language > Large Language Model (0.95)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Communications > Social Media (0.94)
  - Security & Privacy (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found