Robot Learning as an Empirical Science: Best Practices for Policy Evaluation