R3: Questionable that NOTEARS, FGS outperform earlier methods, [2, Table 1] shows MMHC, PC perform

Neural Information Processing Systems 

We thank the reviewers for their efforts. Below we respond to reviewer comments. Thank you for pointing this out. To address R3's concern, we first compared with MMHC and PC in the The significance level α was chosen from the range considered in [2] to minimize SHD. R3: "Paper is fairly incremental, developing a single heuristic local search method (namely NOTEARS that Prop. 3 provides a negative guarantee for NOTEARS (which is not our method), whereas Thms 9 To get from Prop. 3 to KKTS requires several more contributions: reformulating Sec. 2 makes additional contributions in generalizing acyclicity constraints from [32,30]. Abstract: We will add a sentence on the one-parameter-per-edge assumption. Title: We find it difficult to capture this assumption in a few readily understood words, but perhaps R4 has a suggestion. R1: "What leads to better or worse SHD...F (always squared error...danger of overfitting?), thresholding, We think a proper exploration would best be left to a journal extension of this paper.