Reviews: Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees

Neural Information Processing Systems 

A reader unfamiliar with the Context Tree Weighting technique might have a difficult first read, but as it is a well-known technique in information theory and its applications, I don't think this should be held against it. Context Tree Weighting based variants have been used successfully in many different problems (data compression, bioinformatics), but typically deal with relatively low-dimensional binary side information, so this paper provides a method that fills this gap, and in my mind could be built on further by the machine learning community in the future. Some suggestions for future work: - There is a body of literature which improves on the KT estimator on sequences where the entire alphabet isn't observed, which is a common case when the data is recursively partitioned. I am quite certain the method advocated in this approach could benefit from applying the recently introduced SAD estimator (https://arxiv.org/abs/1305.3671) in place of the multi-KT, and the theoretical results extended to this case.