some specific questions, but will incorporate all feedback in the final version

Neural Information Processing Systems 

We thank the reviewers for their careful reading and insightful comments. We will add this in the final version. Transformer-based) models to further shrink the search space. Number of nodes in the graphs seems to be quite low ( 200 for GNMT). Is there some manual grouping operation performed on the computational graph?