MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Neural Information Processing Systems 

The term resolution refers to the width and height of images input into neural networks.