Overview
High-dimensional Semi-supervised Classification via the Fermat Distance
Semi-supervised classification, where unlabeled data are massive but labeled data are limited, often arises in machine learning applications. We address this challenge under high-dimensional data by leveraging the manifold and cluster assumptions. Based on the Fermat distance, a density-sensitive metric that naturally encodes the cluster assumption, we propose the weighted $k$-nearest neighbors (NN) classifier and multidimensional scaling (MDS)-induced classifiers. The use of MDS with a large target dimension allows the effective application of linear classifiers to complex manifold data. Theoretically, we derive a sharp lower bound for the expected excess risk within clusters and prove that the weighted $k$-NN classifier utilizing the true Fermat distance is minimax optimal. Furthermore, we explicitly quantify the utility of unlabeled data by showing that the error arising from estimating the Fermat distance decays exponentially with the pooled sample size. Such a rate is much faster than the related rates in the literature. Extensive experiments on synthetic and real datasets demonstrate competitive or superior performance of our approaches compared to state-of-the-art graph-based semi-supervised classifiers.
Towards Better Evaluation for Dynamic Link Prediction
Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
Appendix
We provide concrete rules below for the two competition tracks that comprise DATACOMP: filtering and BYOD . Additionally, we provide a checklist, which encourages participants to specify design decisions, which allows for more granular comparison between submissions. A.1 Filtering track rules Participants can enter submissions for one or many different scales: small, medium, large or xlarge, which represent the raw number of image-text pairs in CommonPool that should be filtered. After choosing a scale, participants generate a list of uids, where each uid refers to a COMMONPOOL sample. The list of uids is used to recover image-text pairs from the pool, which is used for downstream CLIP training.
Supplementary Material for Kernel Identification Through Transformers ABackground: Self-Attention
Since the attention mechanism is rarely used within the GP literature, we provide a brief review of the topic in this section. Below we follow the description of attention as given by Vaswani et al. [8], including extensions to self-attention and multi-head self-attention. The dot-product attention mechanism [8] takes as input a set of queries, keys and values. The queries and keys have dimension Dz and the values have dimension Dv which may differ from Dz. The operation of dot-product attention then generates weights from the queries and keys which are used to produce a linear mapping of the input values.
retnemge S ecnatsn I / citpona P D2 tfi L evitsartno C
Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation, which encourages multi-view consistency across frames. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Unlike previous approaches, our method does not require an upper bound on the number of objects or object tracking across frames. To demonstrate the scalability of the slow-fast clustering, we create a new semi-realistic dataset called the Messy Rooms dataset, which features scenes with up to 500 objects per scene. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method.