Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators

Jun-13-2026, 12:21:49 GMT–Neural Information Processing Systems

Human-generated reward signals are critical for aligning generative models with human preferences, guiding both training and inference-time evaluations. While large language models (LLMs) employed as proxy evaluators, i.e., LLM-as-a-Judge, significantly reduce the costs associated with manual annotations, they typically require extensive modality-specific training data and fail to generalize well across diverse multimodal tasks.

large language model, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Jun-13-2026, 12:21:49 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)