Review for NeurIPS paper: Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes

Neural Information Processing Systems 

Weaknesses: I have no major concerns, but only remarks and suggestions for improvements. Although this is unambiguous in the experimental section, the abstract and introduction should clarify that the method is self-supervised from stereo pairs. There is a lot of confusion in the literature, because all monocular methods predict depth from a single image (by definition) but can be trained in different ways: from lidar supervision (full or partial), from stereo pairs (as is the case here), or from videos (a.k.a. Some of the authors' critique of related works (e.g., regarding dynamic objects) are only applicable to the SfM self-supervised scenario, as in the case of stereo-based self-supervised learning pairs of images are captured at the same time. Furthermore, the SfM case requires estimating the camera's ego-motion, which vastly complicates the self-supervised learning task (hence why the comparison is not entirely fair in my opinion).