Human-Calibrated Automated Testing and Validation of Generative Language Models