Review for NeurIPS paper: RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

Neural Information Processing Systems 

Weaknesses: - While the overall performance is strong, the reviewer is not excited about the technical novelty. The good performance feels like from putting existing output modalities together. Training cornernet/ centernet in an FPN structure is new, but this part is not well explained in the paper: what is the training loss for the point head? Is that the CornerNet-style focal loss or standard cross-entropy? How to assign different points to different FPN levels?