Why are Visually-Grounded Language Models Bad at Image Classification?