P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting Supplemental Material Ziyi Wang Jie Zhou Jiwen Lu
–Neural Information Processing Systems
A.1 Experiments on Different Pre-trained Image Models We conduct more experiments on point cloud classification tasks with different image models of different scales, ranging from convolution-based ConvNeXt to attention-based Vision Transformer to Swin Transformer. The image model is pre-trained on ImageNet-22k [1] dataset. We report the image classification performance of the original image model finetuned on ImageNet-1k dataset, the number of trainable parameters after Point-to-Pixel Prompting, and the classification accuracy on ModelNet40 [11] and ScanObjectNN [9] datasets. From the quantitative results and accuracy curve in Table 1, we can conclude that enlarging the scale of the same image model will result in higher classification performance, which is consistent with the observations in image classification. A.2 Ablation Studies on Test View Choices During training, the rotation angle θ is randomly selected from [ π, π] and ϕ is randomly selected from [ 0.4π, 0.2π] to keep the objects standing upright in the images.
Neural Information Processing Systems
Feb-5-2025, 07:27:53 GMT