TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation Pengfei Li1 Xiaoxue Chen 1
–Neural Information Processing Systems
Current referring expression comprehension algorithms can effectively detect or segment objects indicated by nouns, but how to understand verb reference is still under-explored. As such, we study the challenging problem of task oriented detection, which aims to find objects that best afford an action indicated by verbs like sit comfortably on. Towards a finer localization that better serves downstream applications like robot interaction, we extend the problem into task oriented instance segmentation. A unique requirement of this task is to select preferred candidates among possible alternatives. Thus we resort to the transformer architecture which naturally models pair-wise query relationships with attention, leading to the TOIST method. In order to leverage pre-trained noun referring expression comprehension models and the fact that we can access privileged noun ground truth during training, a novel noun-pronoun distillation framework is proposed. Noun prototypes are generated in an unsupervised manner and contextual pronoun features are trained to select prototypes. As such, the network remains noun-agnostic during inference. We evaluate TOIST on the large-scale task oriented dataset COCO-Tasks and achieve +10.9% higher mAP
Neural Information Processing Systems
Mar-25-2025, 11:41:59 GMT
- Genre:
- Overview (0.46)
- Research Report (0.68)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (0.93)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Vision (1.00)
- Machine Learning
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology