Review for NeurIPS paper: Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D

Neural Information Processing Systems 

Strengths: The authors have produced a modestly large 3D scene data set (about 10K scenes) in pairs of positive and negative relationships. The authors thus have taken care to generate a data set that gives as much weight to negative examples as to positive ones. They have also dealt with various language ambiguity issues, as spatial relationships for a given view may be based either on the observer's frame or the object's frame of reference. The authors argue, and demonstrate by a small study, the advantage of 3D data for determining spatial relationships over purely 2D approaches. They also show that their minimally contrastive examples allow learning with increased sample efficiency.