Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving