Inverting Visual Representations with Detection Transformers

Rathjens, Jan, Reyhanian, Shirin, Kappel, David, Wiskott, Laurenz

Dec-9-2024–arXiv.org Artificial Intelligence

Understanding the mechanisms underlying deep neural networks in computer vision remains a fundamental challenge. While many prior approaches have focused on visualizing intermediate representations within deep neural networks, particularly convolutional neural networks, these techniques have yet to be thoroughly explored in transformer-based vision models. In this study, we apply the approach of training inverse models to reconstruct input images from intermediate layers within a Detection Transformer, showing that this approach is efficient and feasible for transformer-based vision models. Through qualitative and quantitative evaluations of reconstructed images across model stages, we demonstrate critical properties of Detection Transformers, including contextual shape preservation, inter-layer correlation, and robustness to color perturbations, illustrating how these characteristics emerge within the model's architecture. Our findings contribute to a deeper understanding of transformer-based vision models. The code for reproducing our experiments will be made available at github.com/wiskott-lab/inverse-detection-transformer.

artificial intelligence, machine learning, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

Dec-9-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine (0.68)
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Vision (1.00)