Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods