Improving Visual Commonsense in Language Models via Multiple Image Generation