Review for NeurIPS paper: From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Neural Information Processing Systems 

Additional Feedback: Q1.In the end-to-end training section, do the authors learn embeddings by clustering all points together? As in are train, test, and dev points all clustered together or are each of them clustered separately? If all the points are clustered separately then it might not be a reasonable thing in practice because in practice, we do not have access to test data while training, and nor should any test data be used for doing any sort of training. If authors perform some clustering on test points as well, then it might not be reasonable to assume access to *all* test data at test time. Evaluation on test data should preferably be possible even when test data arrives in an online fashion.