Goto

Collaborating Authors

 Diagnosis


Review for NeurIPS paper: Estimating decision tree learnability with polylogarithmic sample complexity

Neural Information Processing Systems

The submission got four reviews that were quite polarised in their recommendations, with two against accepting and two strongly in favour. The disagreement did not concern the technical quality of the paper. The reviewers agree that the theoretical work in this paper has been very competently performed and in the context of the problem the authors consider, the results are interesting and advance the state of the art. The disagreement is over whether the results are significant enough for NeurIPS or would be more appropriate for a specialised theory conference. The main objections against accepting are (i) the results are not surprising, (ii) the assumptions (monotonicity and uniform distribution) are strong and (iii) the overall computational complexity is high.


Reviews: Efficient Identification in Linear Structural Causal Models with Instrumental Cutsets

Neural Information Processing Systems

UPDATE: Thank you for the thoughtful response, those changes should improve the things that were unclear to me. There is a rich recent literature on identification criteria for linear structural causal models, but most of the recently proposed criteria largely ignore the question of efficient computability. This paper answers important questions in this area by given efficient algorithms for some criteria, while showing others to be NP-complete. The paper is original and generally clear and of high quality. Minor comments: l100: double "a" l104: the equation you refer to is in the supplement, which should be mentioned here.


Reviews: Efficient Identification in Linear Structural Causal Models with Instrumental Cutsets

Neural Information Processing Systems

The paper proposes a method to efficiently find instrumental subsets for identification in linear acyclic SCMs. The reviewers think that the method is interesting and relevant. An improvement to its evaluation would be the addition of an experimental section -- the authors indicated that they will add it in the revised version of the paper.


Review for NeurIPS paper: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Neural Information Processing Systems

Clarity: The main paper is mostly written fairly well, the Appendix less so (lots of typos at least). The work nevertheless lacks clarity because several relevant details are moved to the Supplementary part, and some aspects are not mentioned at all (at least in the main paper). The Appendix even contains a section regarding categorical features that is not even hinted at in the main paper. Clarification is needed, e.g., at the following points: - p.2, l.70-73 is too vague, the meaning is unclear - pls. clarify - p.2, l. 85f: clarify what "[...] i enters leaf node l " means (i.e., that data pt. If \hat{y}_i denotes a predicted label, then why is it real-valued and not in [Y]? (Also regarding the description on p.3, l.96f: why should y_i - \hat{y}_i \geq 1 here -- \hat{y}_i is in R, so couldn't it be, say, y_i - delta for some small delta?) - p.3, l.92: perhaps clarify "tree sparsity" -- actually here this means sparsity of the decision hyperplanes, no the tree itself - The 1-norm is used in the MIP (1) and several times in the text later called "linear" (e.g., p.4, l.136), but this is technically incorrect.


Review for NeurIPS paper: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Neural Information Processing Systems

This paper is about employing advances in computational efficiency of mixed integer programming methods towards decision tree construction problems. While locally optimal methods can achieve an upper bound on the minimization problem efficiently, closing the optimality gap requires tight lower bounds. The authors use an interval relaxation and a support-vector machine procedure to tighten the lower bound. To scale the algorithm, the authors use a LP-based data selection procedure, and perform all experiments using this procedure. It is not clear whether the global optimality properties of the MIP formulation carry through with the data-selection procedure.


Review for NeurIPS paper: Deep Structural Causal Models for Tractable Counterfactual Inference

Neural Information Processing Systems

POST REBUTTAL -- I have read the authors' responses and other reviewers' comments. Unfortunately, some of my primary concerns have not been addressed, which I will elaborate on below. This paper studies the implementation of Pearl's in a SCM, where each of its functions is represented as a neural network. The authors claim that the proposed approaches "are capable of all three levels of Pearl's ladder of causation: association, intervention, and counterfactuals giving rise to a powerful new approach for answering causal questions in imaging applications and beyond." However, I believe the significance of its contributions to the causal inference literature is a bit overstated. In particular, the authors assume that detailed parameterization of the target SCM is *precisely known*.


Review for NeurIPS paper: Deep Structural Causal Models for Tractable Counterfactual Inference

Neural Information Processing Systems

The reviewers agree on the whole that this work addresses an important problem and that the paper makes sound, well-supported claims. The rebuttal did a good job at clarifying the scope of their work, largely improving the scores of the reviewers. I urge the authors to carefully update the paper to address the reviewers concerns in the final version. Examples of what to improve include: - Description of the "intervention vs counterfactual" distinction. One reviewer recommends: "since it is key for the paper's novelty claim I think this distinction needs a little more explanation, perhaps through a simple example" - Engage with the existing literature on causal inference.


Reviews: Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Neural Information Processing Systems

The paper is well written, and its structure is adapted to the content. Upon reading the paper, one might think that the contribution resides in the vertical splitting of the data over the workers, but the state of the art study presented later on shows that this idea by itself is not new. The novelty comes from associating it with data also distributed vertically, sparse bit vectors for inter-node communications, feature compression with custom data structures and training on compressed data. The paper shows formally and experimentally how the proposed heuristics significantly improve the communication between the nodes and speed up training. The remark that using run-length encoding for the features allows them to hold in the L3 cache, thus decreasing the number of DRAM accesses, doesn't seem to always be true. The paper should explain in which conditions this is true (size of the cache, size of the data, number and type of features, etc.).


Reviews: A Communication-Efficient Parallel Algorithm for Decision Tree

Neural Information Processing Systems

Given the popularity of decision trees, proposing an efficient parallel implementation of this method is of course very relevant. The proposed parallelization is original with respect to existing methods and it should indeed lead to less communications than other methods. The theoretical analysis is sound and I like the discussion of the impact of the main problem and method parameters that follows from the lower bound provided in theorem 4.1. Experiments are conducted on two very large problems, where, in the limit of the tested settings (see below), PV-tree is clearly shown to outperform other parallel implementations, in terms of both computing times to reach a given accuracy level and communication costs. I nevertheless have two major concerns with the proposed parallelization.


Learning Causal Models under Independent Changes

Neural Information Processing Systems

In many scientific applications, we observe a system in different conditions in which its components may change, rather than in isolation. In our work, we are interested in explaining the generating process of such a multi-context system using a finite mixture of causal mechanisms. Recent work shows that this causal model is identifiable from data, but is limited to settings where the sparse mechanism shift hypothesis holds and only a subset of the causal conditionals change. As this assumption is not easily verifiable in practice, we study the more general principle that mechanism shifts are independent, which we formalize using the algorithmic notion of independence. We introduce an approach for causal discovery beyond partially directed graphs using Gaussian Process models, and give conditions under which we provably identify the correct causal model.