Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Luo, Ge Ya, Favero, Gian Mario, Luo, Zhi Hao, Jolicoeur-Martineau, Alexia, Pal, Christopher
–arXiv.org Artificial Intelligence
The Fr\'echet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectiveness relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average.
arXiv.org Artificial Intelligence
Oct-8-2024
- Country:
- North America > Canada > Quebec (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks (1.00)
- Statistical Learning (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning
- Data Science (1.00)
- Graphics (0.93)
- Artificial Intelligence
- Information Technology