Goto

Collaborating Authors

 valuable feedback



We thank the reviewers for acknowledging our contributions and for providing valuable feedback

Neural Information Processing Systems

We thank the reviewers for acknowledging our contributions and for providing valuable feedback. The NVIDIA Titan X (Pascal) is rated at 11.0 TFLOPS, so the latency of The critic architecture also follows the WGAN-GP paper. We clarified the concatenation process in our paper and have added the missing hidden-layer citation. Thank you for pointing this out. The GWIN continues to have a positive impact.



We thank all of the reviewers for their time, careful reading, and valuable feedback

Neural Information Processing Systems

We thank all of the reviewers for their time, careful reading, and valuable feedback. Indeed, we verify that the assumption holds for the datasets used in the experiments (see Section 1.3 of the The distinction between norms is another good point. " means computing null E null In Figure 2, the quantities from Eq. (11) are hypothesized We understand that spectral methods are not used in some engineering settings.


Reviewer # 1: We thank you for appreciating our contributions and providing valuable feedback, which will be taken

Neural Information Processing Systems

The empirical results comparing parameter tying vs. naive design are in fact reported in Table 3 of Appendix C.2; a Zhou, 2018) are related to IPVI, as you have suggested. We would like to address your comments and questions below. Regarding the necessity of parameter tying, we think overfitting is still an issue to be addressed. We provide some experimental evidence below, as you have suggested. Train/test mean log-likelihood (MLL) achieved by IPVI with and without parameter tying over 10 runs.


We would like to thank the reviewers for their valuable feedback, which we will duly consider and integrate in our

Neural Information Processing Systems

In this paper, we demonstrate that "the decision boundaries of a DNN can only exist as long We clarify the main points raised by the reviewers here below. We further shed more light on the relationship between adv. Nevertheless, we never claim that, within the discr. In fact, we agree that the margin associated to different discr. Overall, however, we firmly believe that the invariant dirs.


We would like to thank the reviewers for their valuable feedback

Neural Information Processing Systems

We would like to thank the reviewers for their valuable feedback. While this is the case for best-arm identification (see e.g., "Explicit Best Arm Identification in Linear Bandits Using OAM have several practical limitations and they are rarely preferable over LinUCB or LinTS. We significantly improved the regret guarantees w.r.t. K.3, SOLID's performance is not We would like to bring to the reviewers' attention that while the paper is framed in the Lipschitz property of KL divergences between sub-Gaussian distributions (see, e.g., [15]) and the results would be the We cite [4], which refine the original results ofChu et al. [2011]. We have already updated the paper accordingly.


backpropagate through an equilibrium state of the network (which, to the best of our knowledge, no deep approaches

Neural Information Processing Systems

We thank the reviewers for their valuable feedback. The way DEQ "ignores" depth and solves for the equilibrium suggests a different view of output modeling and further We also agree with the reviewers that the runtime discussion should be moved into the main text. We thank reviewer #1 for the valuable feedback. DEQ approach is very different from techniques like gradient checkpointing (GC). It is an implementation-based methodology that is practical on almost any layer-based network. Quantitatively, we have followed the reviewer's suggestion and compared GC and DEQ using a 70-layer TrellisNet (w/ We find that GC works best when we checkpoint after every 9 layers, and record a 5.2GB The training speed of GC is approximately 1.6 We thank reviewer #3 for the comments, and for taking the time to check our proof and read our code.


during learning, numerical precision reduction and for finding the Pareto optimal set of configurations apply directly

Neural Information Processing Systems

We would like to thank the reviewers for their thoughtful comments and valuable suggestions. We will clarify this point in the paper. Our algorithms are agnostic to the leaf distributions used. Thanks for this valuable feedback, we will improve the pseudocode as you suggest. As such, there is memory overhead but no computational overhead.