Text2Model: Model Induction for Zero-shot Generalization Using Task Descriptions

Amosy, Ohad, Volk, Tomer, Ben-David, Eyal, Reichart, Roi, Chechik, Gal

arXiv.org Artificial Intelligence 

We study the problem of generating a training-free task-dependent visual classifier from text descriptions without visual samples. This Text-to-Model (T2M) problem is closely related to zero-shot learning, but unlike previous work, a T2M model infers a model tailored to a task, taking into account all classes in the task. We analyze the symmetries of T2M, and characterize the equivariance and invariance properties of corresponding models. In light of these properties we design an architecture based on hypernetworks that given a set of new class descriptions predicts the weights for an object recognition model which classifies images from those zero-shot classes. We demonstrate the benefits of our approach compared to zero-shot learning from text descriptions in image and point-cloud classification using various types of text descriptions: From single words to rich text descriptions. The dominant paradigm for obtaining predictive models in machine learning is inductive training, often using massive labeled datasets. In contrast, people employ other techniques to obtain predictive models. Specifically, they create task-specific discriminative models based on language instructions, such as "separate soft toys from hard ones" or "collect the furry toy animals" (Markman, 1990). This contrast between machine and human learning is striking, but until now, teaching machines to obtain task-specific discriminative models from natural language descriptions has been limited. Language-based classification has been studied for the closely related, yet different, task of zeroshot learning from text or attributes (ZSL) (Frome et al., 2013; Lampert et al., 2013). Then, images of an unseen concept can be categorized by finding the class whose descriptor is closest to the image in the shared space. The issue is that in this family of approaches the learned representation (and the kNN classifier that it induces) are fixed after training, and are not tuned to a classification task given at inference time.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found