Learning Task-Agnostic Representations through Multi-Teacher Distillation

Formont, Philippe, Darrin, Maxime, Karimian, Banafsheh, Cheung, Jackie CK, Granger, Eric, Ayed, Ismail Ben, Shateri, Mohammadhadi, Piantanida, Pablo

Oct-22-2025–arXiv.org Artificial Intelligence

Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differences in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.

distillation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-22-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- North America
  - United States (0.92)
  - Canada > Quebec (0.28)
- Asia > Middle East
  - UAE (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Education (1.00)
- Health & Medicine
  - Therapeutic Area (0.68)
  - Pharmaceuticals & Biotechnology (0.67)

Technology:
- Information Technology
  - Communications (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning
      - Neural Networks > Deep Learning (0.67)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found