Review for NeurIPS paper: Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization

Neural Information Processing Systems 

The paper proposes a method for LIDAR-based object detection that exploits cross-view consistency between bird's-eye view and range view point clouds of the scene. The two inputs are fed to separate neural networks trained with a loss function that includes a term that encourages consistency between the two representations. Evaluations demonstrate strong performance compared to baselines on NuScenes. The paper was reviewed by four knowledgeable referees, who read the author response and subsequently discussed the paper. The reviewers agree that the manner in which the method exploits the bird's-eye and range views is interesting and elegant, namely the HCS voxel representation that enables feature extraction for both views and the manner in which the method enforces consistency on the transformed feature representations.