DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
–Neural Information Processing Systems
To answer this question, we begin by revisiting the forward procedure of ViTs. A sequence of positional embeddings (PEs) [51] is added to patch embeddings to preserve position information. Intuitively, simply discarding these PEs and requesting the model to reconstruct the position for each patch naturally becomes a qualified location-aware pretext task.
Neural Information Processing Systems
Feb-15-2026, 21:03:20 GMT
- Country:
- Asia > China > Heilongjiang Province > Daqing (0.04)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Representation & Reasoning (0.93)
- Vision (1.00)
- Information Technology > Artificial Intelligence