Mathematical & Statistical Methods
Feature Cross-Substitution in Adversarial Classification
The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.
Bayesian Sampling Using Stochastic Gradient Thermostats
Nan Ding, Youhan Fang, Ryan Babbush, Changyou Chen, Robert D. Skeel, Hartmut Neven
Dynamics-based sampling methods, such as Hybrid Monte Carlo (HMC) and Langevin dynamics (LD), are commonly used to sample target distributions. Recently, such approaches have been combined with stochastic gradient techniques to increase sampling efficiency when dealing with large datasets. An outstanding problem with this approach is that the stochastic gradient introduces an unknown amount of noise which can prevent proper sampling after discretization. To remedy this problem, we show that one can leverage a small number of additional variables to stabilize momentum fluctuations induced by the unknown noise. Our method is inspired by the idea of a thermostat in statistical physics and is justified by a general theory.
Review for NeurIPS paper: Convergence and Stability of Graph Convolutional Networks on Large Random Graphs
Summary and Contributions: This paper presents theoretical analysis of convergence and stability properties of GCNs on large random graphs. It introduces continuous GCNs (c-GCN) that act on a bounded, piecewise-Lipschitz function of unobserved latent node variables which are linked through a similarity kernel. It has two main contributions. Firstly, it studies notions of invariance and equivariance to isomorphism of random graph models, and give convergence results of discrete GCNs to c-GCNs for large graphs. Specifically, for the invariant case the authors claim that the output of both networks lie in the same output space.
Review for NeurIPS paper: Convergence and Stability of Graph Convolutional Networks on Large Random Graphs
This paper considers a continuous version of graph convolutional neural network and analyze the usual discrete GCN as a discrete approximation of the continuous one. Under some random graph generative models, the convergence rate of the discrete one to the continuous one is derived. Moreover, some stability results are given to show that the induced GCN is stable against perturbation of the underlying generative model. The analysis is interesting and the expositions are well written. This kind of continuous-to-discrete type analysis would facilitate further theoretical analysis to understand GCN in general. Therefore, this paper is worth publication in NeurIPS.
Review for NeurIPS paper: A General Large Neighborhood Search Framework for Solving Integer Linear Programs
Additional Feedback: I wonder whether the used LNS requires a local search algorithm for solving the subproblem (Line 3). The authors argue that they set \gamma to 1 because it is a finite-horizon task. I completely agree that this is a possible choice; however even for finite-horizon tasks, \gamma can be set to values smaller than 1.0. I wonder how sensitive their approach is to such hyperparameters. The authors sampled 5 trajectories for each problem (instance?) to estimate the policy gradient. I'm not sure whether I understood that point fully.
Review for NeurIPS paper: A General Large Neighborhood Search Framework for Solving Integer Linear Programs
This paper received positive reviews from all three reviewers but during the discussion there was widespread concern about whether the contribution is of sufficient significance for a NeurIPS publication. In particular, the question was raised whether a paper that merely applies ML techniques in a new application domain was of sufficient significance. I also read the paper and the author's rebuttal and I very much agree with the authors on this point: application papers have always been a part of the major ML conferences and can help drive the field forward. I am therefore happy to recommend acceptance and encourage the authors to spend more text in the final version towards motivating the problem to a general audience.
Review for NeurIPS paper: GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs
Weaknesses: The main weaknesses of the paper are that the work only uses a naïve version of the greedy algorithm rather than the faster lazy greedy algorithm, and that it seems to claim more than the results suggest without further investigation in terms of the scope of applicability, and performance improvements over the greedy algorithm. The approach seems to be specialized to selecting a set of elements for coverage-like problems and specifically submodular maximization problems which admit greedy approximation algorithms, not necessarily general set combinatorial problems as claimed (it is important to clearly and fairly articulate the claimed scope of the proposed algorithms superior performance). Additionally, the greedy algorithm empirically gives near-optimal performance in the experiments, so it would be useful to know whether this approach performs well for more difficult problems, where greedy is not almost optimal. It would be good to see performance on other more combinatorial problems or nonsubmodular set graph problems, e.g. The score supervision used to train the GCN is highly related to the marginal return that greedy would use to score nodes. In addition, the locality metric seems to directly consider the percent of neighbors of a node which are not currently covered by a partial solution, which is directly related to the coverage problems considered in this work.
Review for NeurIPS paper: GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs
Three reviewers rated this paper as weak accept, and one as reject. All reviewers felt the paper combined learning-based techniques effectively to achieve impressive performance on combinatorial optimization problems in massive graphs. Reviewers describe the work as a combination of heuristics and modules consisting of existing techniques, but largely view the overall system as being significant, and comment on its impressive performance and an ablation study to justify individual components. The main criticisms were about missing comparisons to baselines. It was observed that the proposed method essentially does well on submodular coverage style problems where the greedy algorithm is often nearly optimal in practice and its main advantage is being much faster.
Review for NeurIPS paper: Learning Linear Programs from Optimal Decisions
Weaknesses: Unfortunately, the authors stop at the rather obvious suggestion that one can apply SQP to the NLP in question. The implementation, while using PyTorch and many rather complicated tools, does not seem to scale beyond very small instances. Perhaps most importantly, the implementation is compared to a rather basic algorithm for generic NLP (COBYLA of Powell), rather than the state of the art in general-purpose DFO or specialised methods for inverse optimisation.