AITopics | umap

Collaborating Authors

umap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

Chang, Irene, Zikry, Tarek M., Allen, Genevera I.

arXiv.org Machine LearningMay-26-2026

Low-dimensional embeddings are widely used as visual summaries of high-dimensional data and to enable downstream scientific discoveries. Yet, popular nonlinear dimension reduction methods, such as t-SNE and UMAP, are often selected based on visual appeal alone and without rigorous quantitative validation. A major reason is that manifold embeddings typically do not provide an out-of-sample map nor an inverse back to the original feature space; this makes held-out validation, the gold standard in supervised learning, all but impossible. To address these challenges, we develop a novel framework, MEDAL (Manifold Embedding Distillation via Autoencoder Learning), which distills a fitted manifold embedding into a reusable encoder--decoder model. MEDAL trains a constrained autoencoder whose bottleneck exactly matches any teacher embedding while the decoder reconstructs the original input; this yields an explicit map for new samples, an approximate inverse, and a pointwise reconstruction-based measure of distortion in the manifold space. This converts static manifold embeddings into models that can be evaluated on held-out data, enabling quantitative validation including comparing different dimension reduction methods as well as hyperparameter tuning. Across multiple benchmark and scientific case studies, we show that MEDAL enables held-out validation to determine optimal manifold embeddings and hyperparameters, reveals biologically coherent regions that are difficult to preserve in two dimensional embeddings, and detects distribution shift when new samples are mapped into a fixed reference manifold. MEDAL provides a general validation wrapper to any existing dimension reduction technique that will improve the rigor and

artificial intelligence, machine learning, reconstruction error, (18 more...)

arXiv.org Machine Learning

2605.24244

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On UMAP's True Loss Function

Neural Information Processing SystemsApr-25-2026, 07:43:04 GMT

UMAP has supplanted t-SNE as state-of-the-art for visualizing high-dimensional datasets in many disciplines, but the reason for its success is not well understood. In this work, we investigate UMAP's sampling based optimization scheme in detail. We derive UMAP's true loss function in closed form and find that it differs from the published one in a dataset size dependent way. As a consequence, we show that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities. Instead, it tries to reproduce similarities that only encode the knearest neighbor graph, thereby challenging the previous understanding of UMAP's effectiveness. Alternatively, we consider the implicit balancing of attraction and repulsion due to the negative sampling to be key to UMAP's success. We corroborate our theoretical findings on toy and single cell RNA sequencing data.

artificial intelligence, machine learning, similarity, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

b3967a0e938dc2a6340e258630febd5a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 16:58:06 GMT

dataset, dimension, ground truth, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.32)

Add feedback

NavigatingtheEffectofParametrization forDimensionalityReduction

Neural Information Processing SystemsFeb-8-2026, 13:46:31 GMT

Parametric dimensionality reduction methods have gained prominence for their ability togeneralize tounseen datasets, anadvantage that traditional approaches typically lack. Despite their growing popularity, there remains a prevalent misconception among practitioners about the equivalence in performance between parametric and non-parametric methods. Here, we showthat these methods are not equivalent - parametric methods retain global structure but lose significant localdetails.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

2de5d16682c3c35007e4e92982f1a2ba-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 01:46:40 GMT

While the toy and scRNA-seq datasets do not have a split, we used the training and test set of CIFAR-10 jointly for the unsupervised UMAP dimensionreduction.

artificial intelligence, inductive learning, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.34)

Add feedback

On UMAP's True Loss Function

Neural Information Processing SystemsDec-23-2025, 22:52:04 GMT

UMAP has supplanted $t$-SNE as state-of-the-art for visualizing high-dimensional datasets in many disciplines, but the reason for its success is not well understood. In this work, we investigate UMAP's sampling based optimization scheme in detail. We derive UMAP's true loss function in closed form and find that it differs from the published one in a dataset size dependent way. As a consequence, we show that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities. Instead, it tries to reproduce similarities that only encode the $k$ nearest neighbor graph, thereby challenging the previous understanding of UMAP's effectiveness. Alternatively, we consider the implicit balancing of attraction and repulsion due to the negative sampling to be key to UMAP's success. We corroborate our theoretical findings on toy and single cell RNA sequencing data.

name change, true loss function, umap, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

Consensus dimension reduction via multi-view learning

An, Bingxue, Tang, Tiffany M.

arXiv.org Machine LearningDec-19-2025

Dimension reduction methods are a fundamental class of techniques in data analysis, which aim to find a lower-dimensional representation of higher-dimensional data while preserving as much of the original information as possible. These methods are extensively used in practice, including in exploratory data analyses to visualize data--arguably, one of the first and most vital steps in any data analysis (Ray et al., 2021). Notably, in genomics, dimension reduction methods are ubiquitously applied to visualize high-dimensional single-cell RNA sequencing data in two dimensions (Becht et al., 2019). Beyond visualization, dimension reduction methods are also frequently employed to mitigate the curse of dimensionality (Bellman, 1957), engineer new features to improve downstream tasks like prediction (e.g., Massy, 1965), and enable scientific discovery in unsupervised learning settings (Chang et al., 2025). For example, many researchers have used dimension reduction in conjunction with clustering to discover new cell types and cell states (Wu et al., 2021), new cancer subtypes (Northcott et al., 2017), and other substantively-meaningful structure in a variety of domains (Bergen et al., 2019; Traven et al., 2017). Given the widespread use and need for dimension reduction methods, numerous dimension reduction techniques have been developed. Popular techniques include but are not limited to principal component analysis (PCA) (Pearson, 1901; Hotelling, 1933), multidimensional scaling (MDS) (Torgerson, 1952; Kruskal, 1964a), Isomap (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis and Saul, 2000), t-distributed stochastic neighbor embedding (t-SNE) (van der 1

dataset, dimension reduction method, reduction method, (14 more...)

arXiv.org Machine Learning

2512.15802

Country:

Europe > Italy > Sardinia (0.04)
Europe > Italy > Apulia (0.04)
Europe > Italy > Liguria (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.94)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
Health & Medicine > Therapeutic Area > Oncology (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)

Add feedback

Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction

Keck, Janis, Barth, Lukas Silvester, Fatemeh, null, Fahimi, null, Joharinad, Parvaneh, Jost, Jürgen

arXiv.org Machine LearningDec-4-2025

Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from algebraic topology without a clear probabilistic interpretation detaches them from commonly used theoretical frameworks in those areas. In this work we introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets. In particular, this perspective shows that the fuzzy weights of UMAP arise from a generative model that samples Vietoris-Rips filtrations at random scales, yielding cumulative distribution functions of pairwise distances. More generally, the framework connects fuzzy simplicial sets to probabilistic models on the face poset, clarifies the relation between Kullback-Leibler divergence and fuzzy cross-entropy in this setting, and recovers standard t-norms and t-conorms via Boolean operations on the underlying simplicial sets. We then show how new embedding methods may be derived from this framework and illustrate this on an example where we generalize UMAP using Čech filtrations with triplet sampling. In summary, this probabilistic viewpoint provides a unified probabilistic theoretical foundation for fuzzy simplicial sets, clarifies the role of UMAP within this framework, and enables the systematic derivation of new dimensionality reduction methods.

fuzzy simplicial, simplice, simplicial, (16 more...)

arXiv.org Machine Learning

2512.03899

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Saxony > Leipzig (0.04)
North America > United States > Rocky Mountains (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)

Add feedback

Filters

Collaborating Authors

umap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

2de5d16682c3c35007e4e92982f1a2ba-Supplemental.pdf

On UMAP's True Loss Function

b3967a0e938dc2a6340e258630febd5a-AuthorFeedback.pdf

NavigatingtheEffectofParametrization forDimensionalityReduction

2de5d16682c3c35007e4e92982f1a2ba-Supplemental.pdf

2de5d16682c3c35007e4e92982f1a2ba-Paper.pdf

On UMAP's True Loss Function

Consensus dimension reduction via multi-view learning

Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction