Goto

Collaborating Authors

 coco








You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection Y uxin Fang 1 Bencheng Liao 1 Xinggang Wang 1 Jiemin Fang 2, 1

Neural Information Processing Systems

To answer this question, we present Y ou Only Look at One Sequence (YOLOS), a series of object detection models based on the vanilla Vision Transformer with the fewest possible modifications, region priors, as well as inductive biases of the target task.




Supplementary Material for Self-Supervised Visual Representation Learning with Semantic Grouping Xin Wen

Neural Information Processing Systems

There are two operations in our data augmentation pipeline that changes the scale or layout of the image, i.e ., random resized crop and random horizontal flip. This is followed by a resize operation to recover the intersect part to the original size ( e.g ., RoIAlign to recover the original spatial layout. The total stride is 16 (FCN-16s [20]). Intuitively, each prototype can be viewed as the cluster center of a semantic class. During inference, we only take the teacher model parameterized by ξ .