UFO: AUnified Approach to Fine-grained Visual Perception via Open-ended Language Interface
–Neural Information Processing Systems
Generalist models have achieved remarkable success in both language and visionlanguage tasks, showcasing the potential of unified modeling. However, effectively integrating fine-grained perception tasks like detection and segmentation into these models remains a significant challenge. This is primarily because these tasks often rely heavily on task-specific designs and architectures that can complicate the modeling process. To address this challenge, we present UFO, a framework that Unifies Fine-grained visual perception tasks through an Open-ended language interface.
Neural Information Processing Systems
Jun-18-2026, 17:40:46 GMT