Towards General Purpose Vision Systems
Gupta, Tanmay, Kamath, Amita, Kembhavi, Aniruddha, Hoiem, Derek
–arXiv.org Artificial Intelligence
A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question answering, captioning, and more. We evaluate the system's ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting.
arXiv.org Artificial Intelligence
Apr-1-2021
- Country:
- North America > United States (0.14)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Sports (0.46)
- Transportation > Ground
- Road (0.46)
- Technology: