Generative AI for Industrial Contour Detection: A Language-Guided Vision System
Gong, Liang, Tommy, null, Wang, null, Chaker, Sara, Dong, Yanchen, Bousetouane, Fouad, Morton, Brenden, Mendez, Mark
–arXiv.org Artificial Intelligence
In this work, we present a language-guided generative vision system for remnant contour detection in manufacturing, designed to achieve CADlevel precision. The system is organized into three phases: (1) data preprocessing, (2) contour generation using a conditional GAN, and (3) multimodal contour refinement through vision-language modeling, where standardized prompts are crafted in a human-in-the-loop process and applied through image-text guided synthesis. On proprietary FabTrack datasets, the proposed system improved contour fidelity, enhancing edge continuity and geometric alignment while reducing manual tracing. For the refinement phase, we bench-marked several VLMs, including Google's Gemini 2.0 Flash, OpenAI's GPT-image-1 integrated within a VLM-guided workflow, and open-source baselines. Under standardized conditions, GPT-image-1 consistently outperformed Gemini 2.0 Flash in both structural accuracy and perceptual quality. These findings demonstrate the promise of VLM-guided generative workflows for advancing industrial CV beyond the limitations of classical pipelines and moving contour detection closer to CAD-level precision.
arXiv.org Artificial Intelligence
Sep-3-2025