Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem