Becoming Experienced Judges: Selective Test-Time Learning for Evaluators