Inductive Learning
Reviews: Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness
Detecting inputs that are outside the distribution of training examples, including adversarial inputs, is an important problem; reviewers and the area chair agree that this paper makes a useful algorithmic contribution towards solving this problem. The argument that reverse KL is conceptually correct, while forward KL as used previously is conceptually wrong, is significant. Training with reverse KL is a simple and compelling idea that practitioners can try easily. For these reasons the paper is being accepted so that the community can benefit from it quickly, despite the fact that reviewers have identified ways in which the writing of the paper, and the empirical evaluation, need improvement. The authors are encouraged to improve the final version.
A Comprehensive Survey on Imbalanced Data Learning
Gao, Xinyi, Xie, Dongting, Zhang, Yihang, Wang, Zhengren, He, Conghui, Yin, Hongzhi, Zhang, Wentao
With the expansion of data availability, machine learning (ML) has achieved remarkable breakthroughs in both academia and industry. However, imbalanced data distributions are prevalent in various types of raw data and severely hinder the performance of ML by biasing the decision-making processes. To deepen the understanding of imbalanced data and facilitate the related research and applications, this survey systematically analyzing various real-world data formats and concludes existing researches for different data formats into four distinct categories: data re-balancing, feature representation, training strategy, and ensemble learning. This structured analysis help researchers comprehensively understand the pervasive nature of imbalance across diverse data format, thereby paving a clearer path toward achieving specific research goals. we provide an overview of relevant open-source libraries, spotlight current challenges, and offer novel insights aimed at fostering future advancements in this critical area of study.
Robust Graph-Based Semi-Supervised Learning via $p$-Conductances
Robertson, Sawyer Jack, Holtz, Chester, Wan, Zhengchao, Mishne, Gal, Cloninger, Alexander
We study the problem of semi-supervised learning on graphs in the regime where data labels are scarce or possibly corrupted. We propose an approach called $p$-conductance learning that generalizes the $p$-Laplace and Poisson learning methods by introducing an objective reminiscent of $p$-Laplacian regularization and an affine relaxation of the label constraints. This leads to a family of probability measure mincut programs that balance sparse edge removal with accurate distribution separation. Our theoretical analysis connects these programs to well-known variational and probabilistic problems on graphs (including randomized cuts, effective resistance, and Wasserstein distance) and provides motivation for robustness when labels are diffused via the heat kernel. Computationally, we develop a semismooth Newton-conjugate gradient algorithm and extend it to incorporate class-size estimates when converting the continuous solutions into label assignments. Empirical results on computer vision and citation datasets demonstrate that our approach achieves state-of-the-art accuracy in low label-rate, corrupted-label, and partial-label regimes.
Review for NeurIPS paper: Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning
One key difference between the proposed patch-based approach and the baseline Marchenko-Pastur is the patch-based nature of the former. Is smoother appearance of the images in figure 2 and the less noisy tractograms in figure 3 simply because the patch-based approach introduces more smoothing? That seems very likely to me. A potential big advantage of the Marchenko-Pastur is that it does not smooth so preserves detail. This is not tested in the qualitative evaluations of figures 2 and 3 or mentioned anywhere in the text.
Review for NeurIPS paper: Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning
Despite the novelty of the proposed method might be considered marginal with respect to the machine learning community, the contribution to the application field is relevant. The availability of the code represents an added value in the perspective of open science. The authors provided satisfactorily answers in the rebuttal.
Review for NeurIPS paper: Inductive Quantum Embedding
The paper presents an extension of the quantum embeddings of (Gang et al., 2019) -- embeddings that allow for logical expressions to be evaluated. The main contributions are to allow for inductive learning of quantum embeddings, and the design of an algorithm that is significantly faster than the previous one. The experiments show promising results on a fine-grained classification task. The reviewers agreed that the paper presents a solid contribution and the rebuttal answered the reviewers concerns.
Review for NeurIPS paper: Large-Scale Methods for Distributionally Robust Optimization
Summary and Contributions: The paper studies the use of batch stochastic gradient methods to solve large scale DRO problems. In these scenarios, we face two problems: (1) Stochastic gradient estimates of DRO problems are biased; (2) Due to the size of large-scale problems, the convergence rate of the methods used to tackle them should not depend on either the number of parameters d or number of training examples N. The authors tackled problem (1) by defining a surrogate objective for which the gradient estimates are unbiased. Then, by carefully bounding the difference between the true and surrogate objectives as a function of the batch size n, the authors are able to give optimality bounds for the true cost by optimizing the surrogate cost, using a large enough batch size. Moreover, for some classes of robust risks, the authors also bound the variance of the gradient estimates. This allows them to use an accelerated version of the stochastic gradient method which achieves tighter convergence bounds.
Review for NeurIPS paper: FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Four knowledgeable reviewers support acceptance for the contributions. Reviewers find that i) the proposed algorithm is simple; ii) efficient and empirical evaluation is very carefully designed with an extensive ablation study; iii) analysis on augmentation strategy and sharpening also provide good insights. Therefore, I also recommend acceptance. However, please consider revising your paper to address all the concerns and comments from the reviewers.
Supervised Learning with Tensor Networks
Tensor networks are approximations of high-order tensors which are efficient to work with and have been very successful for physics and mathematics applications. We demonstrate how algorithms for optimizing tensor networks can be adapted to supervised learning tasks by using matrix product states (tensor trains) to parameterize non-linear kernel learning models. For the MNIST data set we obtain less than 1% test set classification error. We discuss an interpretation of the additional structure imparted by the tensor network to the learned model.
Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks
Stefan Klampfl, Wolfgang Maass
It is open how neurons in the brain are able to learn without supervision to discriminate between spatio-temporal firing patterns of presynaptic neurons. We show that a known unsupervised learning algorithm, Slow Feature Analysis (SFA), is able to acquire the classification capability of Fisher's Linear Discriminant (FLD), a powerful algorithm for supervised learning, if temporally adjacent samples are likely to be from the same class. We also demonstrate that it enables linear readout neurons of cortical microcircuits to learn the detection of repeating firing patterns within a stream of spike trains with the same firing statistics, as well as discrimination of spoken digits, in an unsupervised manner.