SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision