AITopics | Overview

Semi-supervised classification, where unlabeled data are massive but labeled data are limited, often arises in machine learning applications. We address this challenge under high-dimensional data by leveraging the manifold and cluster assumptions. Based on the Fermat distance, a density-sensitive metric that naturally encodes the cluster assumption, we propose the weighted $k$-nearest neighbors (NN) classifier and multidimensional scaling (MDS)-induced classifiers. The use of MDS with a large target dimension allows the effective application of linear classifiers to complex manifold data. Theoretically, we derive a sharp lower bound for the expected excess risk within clusters and prove that the weighted $k$-NN classifier utilizing the true Fermat distance is minimax optimal. Furthermore, we explicitly quantify the utility of unlabeled data by showing that the error arising from estimating the Fermat distance decays exponentially with the pooled sample size. Such a rate is much faster than the related rates in the literature. Extensive experiments on synthetic and real datasets demonstrate competitive or superior performance of our approaches compared to state-of-the-art graph-based semi-supervised classifiers.

artificial intelligence, classifier, machine learning, (19 more...)

arXiv.org Machine Learning

2604.23573

Genre:

Overview (0.67)
Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Towards Better Evaluation for Dynamic Link Prediction

Neural Information Processing SystemsApr-27-2026, 23:07:47 GMT

Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre:

Research Report (0.93)
Overview (0.93)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Appendix

Neural Information Processing SystemsApr-27-2026, 16:21:51 GMT

We provide concrete rules below for the two competition tracks that comprise DATACOMP: filtering and BYOD . Additionally, we provide a checklist, which encourages participants to specify design decisions, which allows for more granular comparison between submissions. A.1 Filtering track rules Participants can enter submissions for one or many different scales: small, medium, large or xlarge, which represent the raw number of image-text pairs in CommonPool that should be filtered. After choosing a scale, participants generate a list of uids, where each uid refers to a COMMONPOOL sample. The list of uids is used to recover image-text pairs from the pool, which is used for downstream CLIP training.

artificial intelligence, machine learning, natural language, (24 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
North America > United States (0.45)

Genre:

Overview (0.68)
Research Report > New Finding (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(7 more...)

Add feedback

Supplementary Material for Kernel Identification Through Transformers ABackground: Self-Attention

Neural Information Processing SystemsApr-26-2026, 00:24:08 GMT

Since the attention mechanism is rarely used within the GP literature, we provide a brief review of the topic in this section. Below we follow the description of attention as given by Vaswani et al. [8], including extensions to self-attention and multi-head self-attention. The dot-product attention mechanism [8] takes as input a set of queries, keys and values. The queries and keys have dimension Dz and the values have dimension Dv which may differ from Dz. The operation of dot-product attention then generates weights from the queries and keys which are used to produce a linear mapping of the input values.

artificial intelligence, machine learning, survey article, (18 more...)

Neural Information Processing Systems

Genre: Overview (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

564127c03caab942e503ee6f810f54fd-Paper.pdf

Neural Information Processing SystemsApr-26-2026, 00:06:00 GMT

artificial intelligence, machine learning, survey article, (18 more...)

Neural Information Processing Systems

Genre:

Research Report (0.68)
Overview (0.46)

Industry: Transportation (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Communications > Networks (0.68)

Add feedback

5291822d0636dc429e80e953c58b6a76-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 22:19:08 GMT

artificial intelligence, machine learning, trajectory, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Data Science (0.68)

Add feedback

45c166d697d65080d54501403b433256-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 16:21:44 GMT

artificial intelligence, machine learning, survey article, (19 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Overview (0.68)
Research Report > New Finding (0.46)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

retnemge S ecnatsn I / citpona P D2 tfi L evitsartno C

Neural Information Processing SystemsApr-25-2026, 14:54:22 GMT

Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation, which encourages multi-view consistency across frames. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Unlike previous approaches, our method does not require an upper bound on the number of objects or object tracking across frames. To demonstrate the scalability of the slow-fast clustering, we create a new semi-realistic dataset called the Messy Rooms dataset, which features scenes with up to 500 objects per scene. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method.

artificial intelligence, machine learning, survey article, (16 more...)

Neural Information Processing Systems

Country: