Assessing the Quality of AI-Generated Exams: A Large-Scale Field Study