Scalable 3D Captioning with Pretrained Models
–Neural Information Processing Systems
We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects. This approach utilizes pretrained models from image captioning, image-text alignment, and LLM to consolidate captions from multiple views of a 3D asset, completely side-stepping the time-consuming and costly process of manual annotation.
Neural Information Processing Systems
Oct-9-2025, 11:05:19 GMT