TOIST: TaskOrientedInstanceSegmentation TransformerwithNoun-PronounDistillation SupplementaryMaterial

Feb-9-2026, 17:35:38 GMT–Neural Information Processing Systems

As mentioned in Section 3(formulation) of the main paper, in an input image, it is possible that no objects or multiple objects afford a specific task. As areminder,we use the whole verb-pronoun (or verb-noun) description as token span. With probability 0.5, an image is cropped to a random size, where each side is between384and1333pixels. Both of the student and teacher TOIST models are initialized with the model pre-trained by [4]. In an image, the most suitable objects (one or more) for solving the task are selected and their bounding boxes are taken as ground truth labels for detection.

artificial intelligence, detection, specifiedclassesineachtask, (17 more...)

Neural Information Processing Systems

Feb-9-2026, 17:35:38 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.54)
  - Artificial Intelligence > Vision (0.51)

Duplicate Docs Excel Report

Title
70270a1bc28ecb2a2aefad566c5e556b-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found