ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
Choi, Sangbum, Go, Kyeongryeol, Jang, Taewoong
–arXiv.org Artificial Intelligence
F oundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. T o bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-V al and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the F oundational Few-shot Object Detection Challenge, highlighting its practical deployability and gen-eralizability with minimal adaptation and limited data. T o the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.
arXiv.org Artificial Intelligence
Nov-10-2025
- Country:
- Asia > South Korea > Seoul > Seoul (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence