Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators
–Neural Information Processing Systems
Human-generated reward signals are critical for aligning generative models with human preferences, guiding both training and inference-time evaluations. While large language models (LLMs) employed as proxy evaluators, i.e., LLM-as-a-Judge, significantly reduce the costs associated with manual annotations, they typically require extensive modality-specific training data and fail to generalize well across diverse multimodal tasks.
Neural Information Processing Systems
Jun-13-2026, 12:21:49 GMT
- Technology: