NOMAD Projection

Duderstadt, Brandon, Nussbaum, Zach, van der Maaten, Laurens

arXiv.org Artificial Intelligence 

The rapid adoption of generative AI has driven an explosion in the size of datasets consumed and produced by AI models. Traditional methods for unstructured data visualization, such as t-SNE and UMAP, have not kept up with the pace of dataset scaling. This presents a significant challenge for AI explainability, which relies on methods such as t-SNE and UMAP for exploratory data analysis. In this paper, we introduce Negative Or Mean Affinity Discrimination (NOMAD) Projection, the first method for unstructured data visualization via nonlinear dimensionality reduction that can run on multiple GPUs at train time. W e provide theory that situates NOMAD Projection as an approximate upper bound on the InfoNC-t-SNE loss, and empirical results that demonstrate NOMAD Projection's superior performance and speed profile compared to existing state-of-the-art methods. W e demonstrate the scalability of NOMAD Projection by computing the first complete data map of Multilingual Wikipedia. CVPR 2025 Tutorial - Identifying Structure in Data: All you need to know about Dimensionality Reduction, Clustering, and More 1. Introduction The discovery of neural scaling laws has resulted in an explosion in the size of datasets consumed and produced by AI models [11] [9]. Traditional algorithms for unstructured data visualization, such as t-SNE [14] and UMAP [15], have not kept up with the pace of dataset scaling. The presents a significant challenge for data-centric AI explainability, since it relies upon methods like t-SNE and UMAP for exploratory data analysis.