On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
–Neural Information Processing Systems
Large numbers of synthesized videos from diffusion models pose threats to information security and authenticity, leading to an increasing demand for generated content detection. However, existing video-level detection algorithms primarily focus on detecting facial forgeries and often fail to identify diffusion-generated content with a diverse range of semantics. To advance the field of video forensics, we propose an innovative algorithm named Multi-Modal Detection(MM-Det) for detecting diffusion-generated videos. MM-Det utilizes the profound perceptual and comprehensive abilities of Large Multi-modal Models (LMMs) by generating a Multi-Modal Forgery Representation (MMFR) from LMM's multi-modal space, enhancing its ability to detect unseen forgery content.
Neural Information Processing Systems
Mar-27-2025, 11:27:56 GMT
- Country:
- Asia > China (0.14)
- North America > Canada (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks (1.00)
- Performance Analysis > Accuracy (0.46)
- Natural Language > Large Language Model (0.95)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning
- Communications > Social Media (0.94)
- Security & Privacy (1.00)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology