Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP

Neural Information Processing Systems 

Extending this approach to other transformer-based image encoders presents several challenges.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found