Scalable 3D Captioning with Pretrained Models

Oct-9-2025, 11:05:19 GMT–Neural Information Processing Systems

We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects. This approach utilizes pretrained models from image captioning, image-text alignment, and LLM to consolidate captions from multiple views of a 3D asset, completely side-stepping the time-consuming and costly process of manual annotation.

caption, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Oct-9-2025, 11:05:19 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Michigan (0.04)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
ee4814f9bce0cae7991d3341bb081b55-Paper-Datasets_and_Benchmarks.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found