Goto

Collaborating Authors

 otdd


A is a True Distance

Neural Information Processing Systems

Proposition 4.1 is a direct extension of the following well-known bound for the 2-Wasserstein distance Then, for our setting, we have: Proposition 4.1 . In the notation of Section 3, Lemma B.1 implies that for every feature-label pairs Clearly, Gelbrich's bound holds with equality when We next analyze step (i) individually for the two OTDD versions. Information about all the datasets used, including references, are provided in Table 1 .






Review for NeurIPS paper: Geometric Dataset Distances via Optimal Transport

Neural Information Processing Systems

Additional Feedback: ###### POST REBUTTAL After reading the author's response, I increased my score by 1. I believe the general idea of using conditional distributions to compare datasets with no prior training / modeling assumptions is interesting and could lead to potentially interesting future research. Here is why I still think this is not a clear accept, and I hope these remarks will be addressed in the final version: 1) The experiments that were conducted in the paper were very clear and well illustrated, I expect that the naive methods (i), (ii), (iii) discussed in the rebuttal will be included for a quantitative comparison in transfer learning and the other applications and not just comparing the values of OTDD with different methods (fig 1 of the rebuttal) which is not informative; the order of magnitude does not tell anything on the discriminative power of a distance. Could it be explained by the fact the dimension of MNIST is large making Bures too costly to compute? Would you agree that for large d, Sinkhorn is better than OT-N and otherwise for large d? My main concern is that while these results are promising, no baseline was provided to quantify the performance gain of OTDD.


Lightspeed Geometric Dataset Distance via Sliced Optimal Transport

Nguyen, Khai, Nguyen, Hai, Pham, Tuan, Ho, Nhat

arXiv.org Machine Learning

Dataset distances provide a powerful framework for comparing datasets based on their underlying structures, distributions, or content. These measures are essential in applications where understanding the relationships between datasets drives decision-making, such as assessing data quality, detecting distributional shifts, or quantifying biases. They play a critical role in machine learning workflows, enabling tasks like domain adaptation, transfer learning, continual learning, and fairness evaluation. Additionally, dataset distances are valuable in emerging areas such as synthetic data evaluation, 3D shape comparison, and federated learning, where comparing heterogeneous data distributions is fundamental. By capturing meaningful similarities and differences between datasets, these measures facilitate data-driven insights, enhance model robustness, and support novel applications across diverse fields. A common approach to comparing datasets relies on proxies, such as analyzing the learning curves of a predefined model [28, 16] or examining its optimal parameters [1, 22] on a given task. Another strategy involves making strong assumptions about the similarity or co-occurrence of labels between datasets [47]. However, these methods often lack theoretical guarantees, are heavily dependent on the choice of the probe model, and require training the model to completion (e.g., to identify optimal parameters) for each dataset under comparison. To address limitations of previous approaches, model-agnostic approaches are developed.


What explains the success of cross-modal fine-tuning with ORCA?

García-de-Herreros, Paloma, Gautam, Vagrant, Slusallek, Philipp, Klakow, Dietrich, Mosbach, Marius

arXiv.org Artificial Intelligence

ORCA (Shen et al., 2023) is a recent technique for cross-modal fine-tuning, i.e., applying pre-trained transformer models to modalities beyond their training data. The technique consists primarily of training an embedder and fine-tuning the embedder and model. Despite its high performance on a variety of downstream tasks, we do not understand precisely how each of these components contribute to ORCA's success. Therefore, we run a series of ablations and find that embedder training does not help 2D tasks at all, contrary to what the original paper posits. In 1D tasks, some amount of embedder training is necessary but more is not better. In 4 out of 6 datasets we experiment with, it is model fine-tuning that makes the biggest difference. Through our ablations and baselines, we contribute a better understanding of the individual components of ORCA.


Cross-Modal Fine-Tuning: Align then Refine

Shen, Junhong, Li, Liam, Dery, Lucio M., Staten, Corey, Khodak, Mikhail, Neubig, Graham, Talwalkar, Ameet

arXiv.org Artificial Intelligence

Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a series of ablation studies and demonstrate ORCA's utility in data-limited regimes.


Wasserstein Task Embedding for Measuring Task Similarities

Liu, Xinran, Bai, Yikun, Lu, Yuzhe, Soltoggio, Andrea, Kolouri, Soheil

arXiv.org Artificial Intelligence

Measuring similarities between different tasks is critical in a broad spectrum of machine learning problems, including transfer, multi-task, continual, and meta-learning. Most current approaches to measuring task similarities are architecture-dependent: 1) relying on pre-trained models, or 2) training networks on tasks and using forward transfer as a proxy for task similarity. In this paper, we leverage the optimal transport theory and define a novel task embedding for supervised classification that is model-agnostic, training-free, and capable of handling (partially) disjoint label sets. In short, given a dataset with ground-truth labels, we perform a label embedding through multi-dimensional scaling and concatenate dataset samples with their corresponding label embeddings. Then, we define the distance between two datasets as the 2-Wasserstein distance between their updated samples. Lastly, we leverage the 2-Wasserstein embedding framework to embed tasks into a vector space in which the Euclidean distance between the embedded points approximates the proposed 2-Wasserstein distance between tasks. We show that the proposed embedding leads to a significantly faster comparison of tasks compared to related approaches like the Optimal Transport Dataset Distance (OTDD). Furthermore, we demonstrate the effectiveness of our proposed embedding through various numerical experiments and show statistically significant correlations between our proposed distance and the forward and backward transfer between tasks.