Mathematical & Statistical Methods
Review for NeurIPS paper: Convergence and Stability of Graph Convolutional Networks on Large Random Graphs
Summary and Contributions: This paper presents theoretical analysis of convergence and stability properties of GCNs on large random graphs. It introduces continuous GCNs (c-GCN) that act on a bounded, piecewise-Lipschitz function of unobserved latent node variables which are linked through a similarity kernel. It has two main contributions. Firstly, it studies notions of invariance and equivariance to isomorphism of random graph models, and give convergence results of discrete GCNs to c-GCNs for large graphs. Specifically, for the invariant case the authors claim that the output of both networks lie in the same output space.
Review for NeurIPS paper: Convergence and Stability of Graph Convolutional Networks on Large Random Graphs
This paper considers a continuous version of graph convolutional neural network and analyze the usual discrete GCN as a discrete approximation of the continuous one. Under some random graph generative models, the convergence rate of the discrete one to the continuous one is derived. Moreover, some stability results are given to show that the induced GCN is stable against perturbation of the underlying generative model. The analysis is interesting and the expositions are well written. This kind of continuous-to-discrete type analysis would facilitate further theoretical analysis to understand GCN in general. Therefore, this paper is worth publication in NeurIPS.
Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes
Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stopping problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causaldiscovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories.
Review for NeurIPS paper: A General Large Neighborhood Search Framework for Solving Integer Linear Programs
Additional Feedback: I wonder whether the used LNS requires a local search algorithm for solving the subproblem (Line 3). The authors argue that they set \gamma to 1 because it is a finite-horizon task. I completely agree that this is a possible choice; however even for finite-horizon tasks, \gamma can be set to values smaller than 1.0. I wonder how sensitive their approach is to such hyperparameters. The authors sampled 5 trajectories for each problem (instance?) to estimate the policy gradient. I'm not sure whether I understood that point fully.
Review for NeurIPS paper: A General Large Neighborhood Search Framework for Solving Integer Linear Programs
This paper received positive reviews from all three reviewers but during the discussion there was widespread concern about whether the contribution is of sufficient significance for a NeurIPS publication. In particular, the question was raised whether a paper that merely applies ML techniques in a new application domain was of sufficient significance. I also read the paper and the author's rebuttal and I very much agree with the authors on this point: application papers have always been a part of the major ML conferences and can help drive the field forward. I am therefore happy to recommend acceptance and encourage the authors to spend more text in the final version towards motivating the problem to a general audience.
Review for NeurIPS paper: GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs
Weaknesses: The main weaknesses of the paper are that the work only uses a naïve version of the greedy algorithm rather than the faster lazy greedy algorithm, and that it seems to claim more than the results suggest without further investigation in terms of the scope of applicability, and performance improvements over the greedy algorithm. The approach seems to be specialized to selecting a set of elements for coverage-like problems and specifically submodular maximization problems which admit greedy approximation algorithms, not necessarily general set combinatorial problems as claimed (it is important to clearly and fairly articulate the claimed scope of the proposed algorithms superior performance). Additionally, the greedy algorithm empirically gives near-optimal performance in the experiments, so it would be useful to know whether this approach performs well for more difficult problems, where greedy is not almost optimal. It would be good to see performance on other more combinatorial problems or nonsubmodular set graph problems, e.g. The score supervision used to train the GCN is highly related to the marginal return that greedy would use to score nodes. In addition, the locality metric seems to directly consider the percent of neighbors of a node which are not currently covered by a partial solution, which is directly related to the coverage problems considered in this work.
Review for NeurIPS paper: GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs
Three reviewers rated this paper as weak accept, and one as reject. All reviewers felt the paper combined learning-based techniques effectively to achieve impressive performance on combinatorial optimization problems in massive graphs. Reviewers describe the work as a combination of heuristics and modules consisting of existing techniques, but largely view the overall system as being significant, and comment on its impressive performance and an ablation study to justify individual components. The main criticisms were about missing comparisons to baselines. It was observed that the proposed method essentially does well on submodular coverage style problems where the greedy algorithm is often nearly optimal in practice and its main advantage is being much faster.
Review for NeurIPS paper: Learning Linear Programs from Optimal Decisions
Weaknesses: Unfortunately, the authors stop at the rather obvious suggestion that one can apply SQP to the NLP in question. The implementation, while using PyTorch and many rather complicated tools, does not seem to scale beyond very small instances. Perhaps most importantly, the implementation is compared to a rather basic algorithm for generic NLP (COBYLA of Powell), rather than the state of the art in general-purpose DFO or specialised methods for inverse optimisation.
Review for NeurIPS paper: Learning Linear Programs from Optimal Decisions
All the reviewers agreed that this paper studies an interesting problem and a novel method is presented. Although some reviewers initially raised a concern regarding the novelty, the authors provided a clear response and the concern was appropriately addressed. All reviewers and I agreed to suggest acceptance of this submission. Note however that several reviewers pointed out some important concerns. Please consider revising your paper to address them before submitting camera-ready.
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution
Mehrdad Farajtabar, Yichen Wang, Manuel Gomez Rodriguez, Shuang Li, Hongyuan Zha, Le Song
Information diffusion in online social networks is affected by the underlying network topology, but it also has the power to change it. Online users are constantly creating new links when exposed to new information sources, and in turn these links are alternating the way information spreads. However, these two highly intertwined stochastic processes, information diffusion and network evolution, have been predominantly studied separately, ignoring their co-evolutionary dynamics. We propose a temporal point process model, COEVOLVE, for such joint dynamics, allowing the intensity of one process to be modulated by that of the other. This model allows us to efficiently simulate interleaved diffusion and network events, and generate traces obeying common diffusion and network patterns observed in real-world networks. Furthermore, we also develop a convex optimization framework to learn the parameters of the model from historical diffusion and network evolution traces. We experimented with both synthetic data and data gathered from Twitter, and show that our model provides a good fit to the data as well as more accurate predictions than alternatives.