Text2Model: Model Induction for Zero-shot Generalization Using Task Descriptions

Amosy, Ohad, Volk, Tomer, Ben-David, Eyal, Reichart, Roi, Chechik, Gal

Oct-27-2022–arXiv.org Artificial Intelligence

We study the problem of generating a training-free task-dependent visual classifier from text descriptions without visual samples. This Text-to-Model (T2M) problem is closely related to zero-shot learning, but unlike previous work, a T2M model infers a model tailored to a task, taking into account all classes in the task. We analyze the symmetries of T2M, and characterize the equivariance and invariance properties of corresponding models. In light of these properties we design an architecture based on hypernetworks that given a set of new class descriptions predicts the weights for an object recognition model which classifies images from those zero-shot classes. We demonstrate the benefits of our approach compared to zero-shot learning from text descriptions in image and point-cloud classification using various types of text descriptions: From single words to rich text descriptions. The dominant paradigm for obtaining predictive models in machine learning is inductive training, often using massive labeled datasets. In contrast, people employ other techniques to obtain predictive models. Specifically, they create task-specific discriminative models based on language instructions, such as "separate soft toys from hard ones" or "collect the furry toy animals" (Markman, 1990). This contrast between machine and human learning is striking, but until now, teaching machines to obtain task-specific discriminative models from natural language descriptions has been limited. Language-based classification has been studied for the closely related, yet different, task of zeroshot learning from text or attributes (ZSL) (Frome et al., 2013; Lampert et al., 2013). Then, images of an unseen concept can be categorized by finding the class whose descriptor is closest to the image in the shared space. The issue is that in this family of approaches the learned representation (and the kNN classifier that it induces) are fixed after training, and are not tuned to a classification task given at inference time.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-27-2022

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- South America (0.04)
- North America (0.04)
- Europe (0.04)
- Asia
  - East Asia (0.04)
  - South Korea > Seoul
    - Seoul (0.04)
  - Middle East
    - Jordan (0.04)
    - Israel
      - Haifa District > Haifa (0.04)
      - Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found