Embedding Geometries of Contrastive Language-Image Pre-Training
Chou, Jason Chuan-Chih, Alam, Nahid
–arXiv.org Artificial Intelligence
Since the publication of CLIP, the approach of using InfoNCE loss for contrastive pre-training has become widely popular for bridging two or more modalities. Despite its wide adoption, CLIP's original design choices of L2 normalization and cosine similarity logit have rarely been revisited. We have systematically experimented with alternative geometries and softmax logits for language-image pre-training and identified that variants with intuitive Euclidean geometry, Euclidean CLIP (EuCLIP), match or exceed the performance of CLIP and support hierarchical relationships at least as well as more complicated hyperbolic alternative.
arXiv.org Artificial Intelligence
Sep-19-2024
- Country:
- Pacific Ocean > North Pacific Ocean
- San Francisco Bay > Golden Gate (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- New York (0.04)
- Colorado (0.04)
- Washington > King County
- Seattle (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- North Carolina > Durham County
- Durham (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- California
- Los Angeles County > Long Beach (0.04)
- San Francisco County > San Francisco (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Asia > Middle East
- Jordan (0.04)
- Africa > Rwanda
- Pacific Ocean > North Pacific Ocean
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Media > Photography (0.47)
- Technology: