A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations
Lan, Tian, Zhou, Yang-Hao, Ma, Zi-Ao, Sun, Fanshu, Sun, Rui-Qing, Luo, Junyu, Tu, Rong-Cheng, Huang, Heyan, Xu, Chen, Wu, Zhijing, Mao, Xian-Ling
–arXiv.org Artificial Intelligence
Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous automatic evaluation methods exist, current research lacks a systematic framework that comprehensively organizes these methods across text, visual, and audio modalities. To address this issue, we present a comprehensive review and a unified taxonomy of automatic evaluation methods for generated content across all three modalities; We identify five fundamental paradigms that characterize existing evaluation approaches across these domains. Our analysis begins by examining evaluation methods for text generation, where techniques are most mature. We then extend this framework to image and audio generation, demonstrating its broad applicability. Finally, we discuss promising directions for future research in cross-modal evaluation methodologies.
arXiv.org Artificial Intelligence
Jun-13-2025
- Country:
- Europe (1.00)
- North America > United States
- California (0.45)
- Minnesota (0.27)
- Asia
- China (0.68)
- Middle East (0.67)
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.45)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Music (0.67)
- Information Technology > Security & Privacy (0.67)
- Education > Assessment & Standards
- Student Performance (0.45)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Speech > Speech Recognition (1.00)
- Representation & Reasoning (1.00)
- Cognitive Science > Problem Solving (1.00)
- Natural Language
- Text Processing (1.00)
- Large Language Model (1.00)
- Chatbot (1.00)
- Discourse & Dialogue (0.92)
- Machine Learning > Neural Networks
- Deep Learning > Generative AI (0.48)
- Information Technology > Artificial Intelligence