correlation head
Summary: Few-Shot Object Detection with Fully Cross-Transformer
Object detection typically requires a large amount of label data and deep CNN[3] architecture which process the labeled data to learn the parameters of the model. Two popular object detection approaches are RCNN[5] and YOLO[4] which typically fall in this category. However, in general, real-world data suffers from a long-tail distribution where for the majority of categories only a small amount of data is available. Even if the data is available it's a tedious task to hand-labeled millions of images for training. An alternative approach to build an architecture that can learn from the small amount of data and yet perform equally well on unseen data.