Goto

Collaborating Authors

 baseline model










Frequency-enhanced Data Augmentation for Vision-and-Language Navigation--- -- Supplemental Material--- -- Keji He

Neural Information Processing Systems

Table 1 presents the impacts of different random seeds for sampling the interference images. Experiments in the main manuscript are based on seed-1 which has an average performance. Figure 1: Navigation examples in normal and high-frequency perturbed scenes. In the examples shown in Figure 4, both models obtained similar textual attention. In Figure 6, according to the given instruction, the agent should turn left to enter the room corresponding to the second view.


AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

Neural Information Processing Systems

It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset, to obtain unified representations that can achieve promising results on different tasks or benchmarks. Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset. We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data to generate the unified backbone representations that can be directly applied to many baseline models and benchmarks, decoupling the AD-related pre-training process and downstream fine-tuning task. During the period of backbone pre-training, by enhancing the scene-and instance-level distribution diversity and exploiting the backbone's ability to learn from unknown instances, we achieve significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint.