ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts

Choi, Sangbum, Go, Kyeongryeol, Jang, Taewoong

Nov-10-2025–arXiv.org Artificial Intelligence

F oundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. T o bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-V al and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the F oundational Few-shot Object Detection Challenge, highlighting its practical deployability and gen-eralizability with minimal adaptation and limited data. T o the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Nov-10-2025

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found