Label-Efficient Grasp Joint Prediction with Point-JEPA

Sep-26-2025–arXiv.org Artificial Intelligence

Abstract--We study whether 3D self-supervised pretraining with Point-JEPA enables label-efficient grasp joint-angle prediction. Meshes are sampled to point clouds and tokenized; a ShapeNet-pretrained Point-JEPA encoder feeds a K=5 multi-hypothesis head trained with winner-takes-all and evaluated by top-logit selection. On a multi-finger hand dataset with strict object-level splits, Point-JEPA improves top-logit RMSE and Coverage@15 in low-label regimes (e.g., 26% lower RMSE at 25% data) and reaches parity at full supervision, suggesting JEPA-style pretraining is a practical lever for data-efficient grasp learning. Self-supervised learning (SSL) for 3D data has largely progressed along three directions. On point clouds this includes point/voxel masked autoencoding; e.g., V oxel-MAE reconstructs masked voxels for sparse automotive LiDAR and improves downstream tasks with fewer labels [1]-[4].

artificial intelligence, inductive learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

Sep-26-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.36)