Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions