Hyperbolic Image-Text Representations

Desai, Karan, Nickel, Maximilian, Rajpurohit, Tanmay, Johnson, Justin, Vedantam, Ramakrishna

Jun-5-2023–arXiv.org Artificial Intelligence

Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept "dog" entails all images that contain dogs. Despite being intuitive, current large-scale vision and language models such as CLIP do not explicitly capture such hierarchy. We propose MERU, a contrastive model that yields hyperbolic representations of images and text. Hyperbolic spaces have suitable geometric properties to embed tree-like data, so MERU can better capture the underlying hierarchy in image-text datasets. Our results show that MERU learns a highly interpretable and structured representation space while being competitive with CLIP's performance on standard multi-modal tasks like image classification and image-text retrieval.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-5-2023

arXiv.org PDF

Add feedback

Country:
- Pacific Ocean > North Pacific Ocean
  - San Francisco Bay > Golden Gate (0.04)
- North America
  - United States
    - Arizona (0.04)
    - Michigan (0.04)
    - North Dakota > Billings County (0.04)
    - Alaska
      - Prince of Wales-Hyder Census Area > Craig (0.04)
      - Juneau City and Borough > Juneau (0.04)
    - Nevada > Clark County
      - Las Vegas (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - New York
      - New York County > New York City (0.14)
      - Richmond County > New York City (0.04)
      - Queens County > New York City (0.04)
      - Kings County > New York City (0.04)
      - Bronx County > New York City (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - Texas > Galveston County
      - Galveston (0.04)
    - California
      - San Francisco County > San Francisco (0.14)
      - Alameda County > Oakland (0.04)
    - Ohio > Greene County
      - Fairborn (0.04)
  - Mexico > Jalisco
    - Tlaquepaque (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - Newfoundland and Labrador > Labrador (0.04)
    - Alberta (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Norway (0.04)
  - Poland (0.04)
  - Denmark (0.04)
  - Bulgaria (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - United Kingdom > England
    - Kent > Dover (0.04)
    - Isle of Wight (0.04)
    - Cumbria (0.04)
  - Greece > Attica
    - Athens (0.04)
  - Finland > Uusimaa
    - Helsinki (0.04)
  - Slovenia > Coastal-Karst
    - Municipality of Koper > Koper (0.04)
- Asia
  - Japan (0.04)
  - Malaysia (0.04)
  - Indonesia > Bali (0.04)
  - India > NCT
    - New Delhi (0.04)
- Africa
  - South Africa (0.04)
  - Uganda (0.04)
  - Kenya > Lamu County
    - Lamu (0.04)

Genre:
- Research Report > New Finding (0.86)

Industry:
- Leisure & Entertainment (1.00)
- Consumer Products & Services (1.00)
- Energy (0.67)
- Transportation (0.67)
- Media > Photography (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Communications > Social Media (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning
      - Neural Networks > Deep Learning (1.00)
      - Statistical Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found