AITopics | geometric dataset distance

Geometric Dataset Distances via Optimal Transport

Neural Information Processing SystemsDec-24-2025, 21:33:21 GMT

The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks, and many are architecture-dependent, relying on task-specific optimal parameters (e.g., require training a model on each dataset). In this work we propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing. This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties. Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.

geometric dataset distance, name change, optimal transport, (2 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Geometric Dataset Distances via Optimal Transport

Neural Information Processing SystemsFeb-8-2025, 06:20:05 GMT

Additional Feedback: ###### POST REBUTTAL After reading the author's response, I increased my score by 1. I believe the general idea of using conditional distributions to compare datasets with no prior training / modeling assumptions is interesting and could lead to potentially interesting future research. Here is why I still think this is not a clear accept, and I hope these remarks will be addressed in the final version: 1) The experiments that were conducted in the paper were very clear and well illustrated, I expect that the naive methods (i), (ii), (iii) discussed in the rebuttal will be included for a quantitative comparison in transfer learning and the other applications and not just comparing the values of OTDD with different methods (fig 1 of the rebuttal) which is not informative; the order of magnitude does not tell anything on the discriminative power of a distance. Could it be explained by the fact the dimension of MNIST is large making Bures too costly to compute? Would you agree that for large d, Sinkhorn is better than OT-N and otherwise for large d? My main concern is that while these results are promising, no baseline was provided to quantify the performance gain of OTDD.

geometric dataset distance, ground cost, optimal transport, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Geometric Dataset Distances via Optimal Transport

Neural Information Processing SystemsJan-15-2025, 16:37:38 GMT

The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks, and many are architecture-dependent, relying on task-specific optimal parameters (e.g., require training a model on each dataset). In this work we propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing. This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties. Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.

geometric dataset distance, optimal transport

Neural Information Processing Systems

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Geometric Dataset Distances via Optimal Transport

Alvarez-Melis, David, Fusi, Nicolò

arXiv.org Machine LearningFeb-7-2020

The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks, and many are architecture-dependent, relying on task-specific optimal parameters (e.g., require training a model on each dataset). In this work we propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing. This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties. Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.

dataset, geometric dataset distance, mnist, (13 more...)

arXiv.org Machine Learning

2002.02923

Country: