You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection Y uxin Fang 1 Bencheng Liao 1 Xinggang Wang 1 Jiemin Fang 2, 1

Neural Information Processing Systems 

To answer this question, we present Y ou Only Look at One Sequence (YOLOS), a series of object detection models based on the vanilla Vision Transformer with the fewest possible modifications, region priors, as well as inductive biases of the target task.