Refereeing the Referees: Evaluating Two-Sample Tests for Validating Generators in Precision Sciences