AITopics | Statistical Learning

On UMAP's True Loss Function

Neural Information Processing SystemsApr-25-2026, 07:43:04 GMT

UMAP has supplanted t-SNE as state-of-the-art for visualizing high-dimensional datasets in many disciplines, but the reason for its success is not well understood. In this work, we investigate UMAP's sampling based optimization scheme in detail. We derive UMAP's true loss function in closed form and find that it differs from the published one in a dataset size dependent way. As a consequence, we show that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities. Instead, it tries to reproduce similarities that only encode the knearest neighbor graph, thereby challenging the previous understanding of UMAP's effectiveness. Alternatively, we consider the implicit balancing of attraction and repulsion due to the negative sampling to be key to UMAP's success. We corroborate our theoretical findings on toy and single cell RNA sequencing data.

artificial intelligence, machine learning, similarity, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

16e4be78e61a3897665fa01504e9f452-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:42:53 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
(2 more...)

Add feedback

2dace78f80bc92e6d7493423d729448e-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 07:42:46 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

2d95666e2649fcfc6e3af75e09f5adb9-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 07:26:04 GMT

artificial intelligence, gcn, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.29)

Genre:

Research Report (0.46)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

description of our method

Neural Information Processing SystemsApr-25-2026, 07:25:58 GMT

Algorithm 2 Procedure for estimating the weights 1: procedure ESTIMATEWEIGHTS( Teacher,Student,V,D) 2:.V is the validation dataset and D is the teacher-labeled dataset 3: U, k d12 p |V|e 4: for every (x,y) V do 5: X (Confidence(Teacher(x)),Confidence(Student(x))) 6: if arg max(Teacher(x)) = arg max(y) then: 7: (p,distortion) (0,1) 8: else: B.1 The student's test-accuracy-trajectory In this section we provide extended experimental results that show the student's test accuracy over the training trajectory corresponding to experiments we mentioned in Section 3.1. Notice that in the vast majority of cases our method significantly outperforms the conventional approach almost throughout the training process. The student's test accuracy over the training trajectory using harddistillation corresponding to the experiments of Figure 4. See Section 3.1.2 The student's test accuracy over the training trajectory corresponding to the experiments of Figure 5. See Section 3.1.2 The student's test accuracy over the training trajectory corresponding to the experiments of Figure 7. See Section 3.1.3 The student's test accuracy over the training trajectory using hard-distillation (first row) and soft-distillation (second row) corresponding to the experiments of Figure 8. See Section 3.1.4 Indeed, it is known (see e.g.

artificial intelligence, machine learning, objective, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

16c5b4102a6b6eb061e502ce6736ad8a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:25:15 GMT

artificial intelligence, machine learning, statistics, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport

Neural Information Processing SystemsApr-25-2026, 07:24:53 GMT

Few-shot classification aims to learn a classifier to recognize unseen classes during training, where the learned model can easily become over-fitted based on the biased distribution formed by only a few training examples. A recent solution to this problem is calibrating the distribution of these few sample classes by transferring statistics from the base classes with sufficient examples, where how to decide the transfer weights from base classes to novel classes is the key. However, principled approaches for learning the transfer weights have not been carefully studied. To this end, we propose a novel distribution calibration method by learning the adaptive weight matrix between novel samples and base classes, which is built upon a hierarchical Optimal Transport (H-OT) framework. By minimizing the high-level OT distance between novel samples and base classes, we can view the learned transport plan as the adaptive weight information for transferring the statistics of base classes. The learning of the cost function between a base class and novel class in the high-level OT leads to the introduction of the lowlevel OT, which considers the weights of all the data samples in the base class. Experiments on standard benchmarks demonstrate that our proposed plug-andplay model outperforms competing approaches and owns desired cross-domain generalization ability, proving the effectiveness of the learned adaptive weights. 1

artificial intelligence, base class, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

2e163450c1ae3167832971e6da29f38d-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:24:40 GMT

artificial intelligence, correspondence, machine learning, (11 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Details

Neural Information Processing SystemsApr-25-2026, 07:23:57 GMT

The training is stalled if the size of the replay buffer is smaller than the minibatch size, i.e., if |B|< M. Algorithms 3 and 4 show the critic network update and the actor network and uncertainty parameter sampler update, respectively. Although we write the gradient-based update in the form of a mini-batch stochastic gradient update for simplicity, we employ an adaptive approach such as Adam [16]. The update of pk follows the exponential moving average with the momentum (1/Tlast), where Tlast is the number of steps spent in the last episode (Tlast is set to 1000 for the first episode). The reason behind this design choice is as follows. The short episode is a meaning that a bad uncertainty parameter ω is used in the last episode.

artificial intelligence, machine learning, worst-case performance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback