EBind: a practical approach to space binding
Broadbent, Jim, Cohen, Felix, Hvilshøj, Frederik, Landau, Eric, Sasoglu, Eren
–arXiv.org Artificial Intelligence
We simplify space binding by focusing on two core components, a single encoder per modality and high-quality data; enabling training state-of-the-art models on a single GPU in a few hours as opposed to multiple days. We present EBind, an Easy, data-centric, and parameter-efficient method to Bind the embedding spaces of multiple contrastive models. We demonstrate that a simple 1.8B-parameter image-text-video-audio-3D model can outperform models 4 to 17x the size. The key to achieving this is a carefully curated dataset of three complementary data sources: i) 6.7M fully-automated multimodal quintuples sourced via SOTA retrieval models, ii) 1M diverse, semi-automated triples annotated by humans as negative, partial, or positive matches, and iii) 3.4M pre-existing captioned data items. We use 13 different evaluations to demonstrate the value of each data source. Due to limitations with existing benchmarks, we further introduce the first high-quality, consensus-annotated zero-shot classification benchmark between audio and PCs. In contrast to related work, we will open-source our code, model weights, and datasets.
arXiv.org Artificial Intelligence
Nov-19-2025
- Country:
- Asia
- Singapore (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Europe
- Austria > Vienna (0.14)
- France > Île-de-France
- Greece (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- Georgia > Fulton County
- Atlanta (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Kansas > Sheridan County (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Rhode Island (0.04)
- Tennessee > Davidson County
- Nashville (0.04)
- Georgia > Fulton County
- Canada
- Oceania
- Australia
- Queensland > Brisbane (0.04)
- Victoria > Melbourne (0.04)
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Australia
- Asia
- Genre:
- Research Report (0.70)
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.93)
- Natural Language > Large Language Model (0.90)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence