On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation