Model Criticism for Long-Form Text Generation