Allen AI & UW Propose Unified-IO: A High-Performance, Task-Agnostic Model for CV, NLP, and Multi-Modal Tasks
Building a general-purpose unified model that can solve diverse tasks in different modalities while maintaining high performance is a long-standing challenge in the machine learning research community. A conventional approach in this direction is building models with task-specialized heads on top of a shared architectural backbone -- but such models require expert knowledge to design a specialized head for each task, and their lack of parameter-sharing for new tasks limits their transfer-learning capabilities. In the new paper Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks, a research team from the Allen Institute for AI and the University of Washington introduces UNIFIED-IO, a neural model with no task- or modality-specific branches that achieves competitive performance across a wide variety of computer vision (CV), natural language processing (NLP), and multi-modal benchmark tasks without fine-tuning. The researchers set out to build a unified neural architecture that ML practitioners with little or no knowledge of the underlying machinery could use to efficiently and effectively train their models for new NLP and CV tasks. For models to support a variety of modalities (images, language, boxes, binary masks, segmentation, etc.), they must represent all modalities in a shared space.
Oct-19-2022, 21:18:59 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.54)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence