Goto

Collaborating Authors

 Education



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper reduces a broad class of machine learning problems involving latent variables to the problem of finding anchors defining the conical hull of the data (via the method of moments). In addition, it proposes a new divide-and-conquer algorithm based on random projections to speed up the search for the anchors. Overall, I found this an interesting paper presenting significant contributions. However the presentation could be greatly improved as it lacks clarity here and there. It looks like this paper was squeezed in a hurry to fit the 8-page limit.


Inference for Batched Bandits

Neural Information Processing Systems

However, for many real-world problems it is not enough to just minimize regret on a particular problem instance. For example, suppose we have run an online education experiment using a bandit algorithm where we test different types of teaching strategies.


DynaBERT: Dynamic BERT with Adaptive Width and Depth Lu Hou

Neural Information Processing Systems

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size. They can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as Dyn-aBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks.



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Perhaps a log-log plot would be better. Q2: Please summarize your review in 1-2 sentences This is a well-written and clear paper, but I think the proposed method is well understood by the graphical models community and is not that original. I also feel that the experiments section was not objective enough - both the strengths and the weakness of a method need to be discussed by the authors.





Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes an algorithm for online combinatorial optimization. In this online learning problem, the action space is combinatorially large and can be represented in a d-dimensional Euclidean space such that the loss in each time step is a linear function of the action. It would greatly improve the paper if there was a thorough comparison between the new algorithm and Online Stochastic Mirror Descent (OSMD by Audibert et al., [3] in the current paper) both in terms of how the algorithms work and in terms of regret bounds. In the current form of the paper, I am not sure if the new algorithm is significantly different from OSMD or if it improves its bounds.