Deconstructing Self-Bias in LLM-generated Translation Benchmarks