Diagnosing and Rectifying Vision Models using Language