Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks