Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision Transformers for High-Level Image Classification