Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition