Binary Verification for Zero-Shot Vision
–arXiv.org Artificial Intelligence
W e propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (ii) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. W e evaluate the workflow on referring expression grounding (REC), spatial reasoning (Spatial-Map, Spatial-Grid, Spatial-Maze), and BLINK-Jigsaw. Relative to answering open-ended queries directly, quantization to MCQ yields large gains, and True/False binarization provides a consistent additional boost. Across all tasks, the same workflow produces significant improvements, indicating generality. Our theory formalizes how open-ended vision queries can be quantized to MCQs and further bina-rized into True/False verifications, establishing a hardness ladder (T/F MCQ K-way). A simple analysis explains why Boolean resolution boosts accuracy. T ogether, these components yield a simple and unified workflow that emphasizes inference-time design over task-specific training. It offers a practical, drop-in path to stronger zero-shot vision with today's VLMs.
arXiv.org Artificial Intelligence
Nov-17-2025
- Country:
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Genre:
- Research Report (0.64)
- Workflow (0.97)
- Technology: