Nearest Neighbor Methods
Sparse High-Dimensional Isotonic Regression
We consider the problem of estimating an unknown coordinate-wise monotone function given noisy measurements, known as the isotonic regression problem. Often, only a small subset of the features affects the output. This motivates the sparse isotonic regression setting, which we consider here. We provide an upper bound on the expected VC entropy of the space of sparse coordinate-wise monotone functions, and identify the regime of statistical consistency of our estimator. We also propose a linear program to recover the active coordinates, and provide theoretical recovery guarantees. We close with experiments on cancer classification, and show that our method significantly outperforms several standard methods.
Meta-Neighborhoods
Making an adaptive prediction based on one's input is an important ability for general artificial intelligence. In this work, we step forward in this direction and propose a semi-parametric method, Meta-Neighborhoods, where predictions are made adaptively to the neighborhood of the input. We show that Meta-Neighborhoods is a generalization of k-nearest-neighbors. Due to the simpler manifold structure around a local neighborhood, Meta-Neighborhoods represent the predictive distribution p(y | x) more accurately. To reduce memory and computation overhead, we propose induced neighborhoods that summarize the training data into a much smaller dictionary. A meta-learning based training mechanism is then exploited to jointly learn the induced neighborhoods and the model. Extensive studies demonstrate the superiority of our method.
On UMAP's True Loss Function
UMAP has supplanted t-SNE as state-of-the-art for visualizing high-dimensional datasets in many disciplines, but the reason for its success is not well understood. In this work, we investigate UMAP's sampling based optimization scheme in detail. We derive UMAP's true loss function in closed form and find that it differs from the published one in a dataset size dependent way. As a consequence, we show that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities. Instead, it tries to reproduce similarities that only encode the k nearest neighbor graph, thereby challenging the previous understanding of UMAP's effectiveness. Alternatively, we consider the implicit balancing of attraction and repulsion due to the negative sampling to be key to UMAP's success. We corroborate our theoretical findings on toy and single cell RNA sequencing data.
Fast Iterative and Task-Specific Imputation with Online Learning
Bordoloi, Rahul, Rรฉda, Clรฉmence, Bej, Saptarshi
Missing feature values are a significant hurdle for downstream machine-learning tasks such as classification and regression. However, they are pervasive in multiple real-life use cases, for instance, in drug discovery research. Moreover, imputation methods might be time-consuming and offer few guarantees on the imputation quality, especially for not-missing-at-random mechanisms. We propose an imputation approach named F3I based on the iterative improvement of a K-nearest neighbor imputation that learns the weights for each neighbor of a data point, optimizing for the most likely distribution of points over data points. This algorithm can also be jointly trained with a downstream task on the imputed values. We provide a theoretical analysis of the imputation quality by F3I for several types of missing mechanisms. We also demonstrate the performance of F3I on both synthetic data sets and real-life drug repurposing and handwritten-digit recognition data.
Supplementary Material Dynamic Results
The different cases represent various material property configurations. In Figure 8 we show the temporal evolution of different scenarios (a) to (d) for the initially straight bending rod, and (e) to (f) for the elastic helix. In (b), we modify N 2 {10, 20, 40, 60}. In (c), we modify ` 2 {1.0 m, 2.0 m, 3.0 m, 4.0 m, 5.0 m, 6.0 m}. In (d), we modify Y 2 {2.0 10 The default parameters of the elastic helix are HR = 0.5 m (helix radius), HH = 0.5 m (helix height), HW = 2.5 (winding number), T = 1.0 10 In (f), we modify N 2{30, 60, 80, 100}.
Cross-Scale Internal Graph Neural Network for Image Super-Resolution Shangchen Zhou
Non-local self-similarity in natural images has been well studied as an effective prior in image restoration. However, for single image super-resolution (SISR), most existing deep non-local methods (e.g., non-local neural networks) only exploit similar patches within the same scale of the low-resolution (LR) input image. Consequently, the restoration is limited to using the same-scale information while neglecting potential high-resolution (HR) cues from other scales. In this paper, we explore the cross-scale patch recurrence property of a natural image, i.e., similar patches tend to recur many times across different scales. This is achieved using a novel cross-scale internal graph neural network (IGNN). Specifically, we dynamically construct a cross-scale graph by searching k-nearest neighboring patches in the downsampled LR image for each query patch in the LR image.
Statistical Guarantees of Distributed Nearest Neighbor Classification
Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one. Our findings are supported by numerical studies.
Geometric Order Learning for Rank Estimation
A novel approach to rank estimation, called geometric order learning (GOL), is proposed in this paper. First, we construct an embedding space, in which the direction and distance between objects represent order and metric relations between their ranks, by enforcing two geometric constraints: the order constraint compels objects to be sorted according to their ranks, while the metric constraint makes the distance between objects reflect their rank difference. Then, we perform the simple k nearest neighbor (k-NN) search in the embedding space to estimate the rank of a test object. Moreover, to assess the quality of embedding spaces for rank estimation, we propose a metric called discriminative ratio for ranking (DRR). Extensive experiments on facial age estimation, historical color image (HCI) classification, and aesthetic score regression demonstrate that GOL constructs effective embedding spaces and thus yields excellent rank estimation performances.
Reviews: Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators
There are some issues with the conditions imposed on distributions in this paper. In particular, regarding the local bound hypothesis: in the proof of Lemma 2 epsilon may depend on x, so p * will be smaller than stated. As an example, if p(x) goes to 0 as x a when x goes to 0, then p *(x) goes to 0 as x (a 1). AFTER REBUTTAL: 1) The rebuttal here is helpful, but there seem to still be some issues with the local lower bounds hypothesis. They claim that it is very mild in light of Lemma 2, but Lemma 2 doesn't apply very well to the situation of p x a. I think that they should at least include what they've written in the rebuttal.
Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators
Shashank Singh, Barnabas Poczos
We provide finite-sample analysis of a general framework for using k-nearest neighbor statistics to estimate functionals of a nonparametric continuous probability density, including entropies and divergences. Rather than plugging a consistent density estimate (which requires k as the sample size n) into the functional of interest, the estimators we consider fix k and perform a bias correction. This is more efficient computationally, and, as we show in certain cases, statistically, leading to faster convergence rates. Our framework unifies several previous estimators, for most of which ours are the first finite sample guarantees.