Review for NeurIPS paper: Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks

Neural Information Processing Systems 

Additional Feedback: I'd be interested to hear whether the proposed approach could also benefit from an attention mechanism similar to GAT. I wasn't entirely sure about the setup for the experiment where the training size is reduced. Is this taking a fixed graph and then simply hiding an increasing portion of the node labels, or is the graph structure different between the settings with reduced training size? Are the number of nodes for which labels are predicted the same between each setting? Is each unlabelled node always connected to at least one labelled node or does the reduction of training size also mean that the nearest labelled node might be further away in the low training size regime?