Markov Models
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper develops a new method of performing blind source separation, by formulating the problem as an additive factorial HMM (AFHMM), and then applying signal aggregate constraints (SACs). The motivation behind this is that additional domain knowledge can be incorporated to improve the separation of the time series into components. The example used throughout the paper is energy disaggregation, where the components of domestic energy use (relating to individual appliances) can be better separated, when information relating to total (expected) usage of each appliance in a time period is incorporated. The objective function that is maximized to perform the separation (which is the log of the posterior distribution of the hidden chains given the observed data) is then transformed into a convex optimization problem.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The authors consider the problem of learning a mixture of Hidden Markov Models. The authors first suggest using a spectral learning algorithm to learn a set of parameters for a hidden Markov model, and then provide a method for resolving the permutation ambiguity in the transition matrix to recover it's underlying block-diagonal structure. I found this paper to be very well written for the most part. The experimental results section could be fleshed out a bit. In particular the 2. The authors rely on the fact that a mixture of Hidden Markov Models can be expressed as a single HMM.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: This paper provides a Bayesian expected regret bound for the Posterior Sampling for the Reinforcement Learning (PSRL) algorithm. PSRL has been introduced by [Strens2000], and can be seen as the application of Thompson sampling for RL problems: a model is sampled from the (posterior) distribution over models, the optimal policy for the sampled model is calculated, the policy is followed until the end of the horizon, and the distribution over models is updated. PSRL for finite MDPs has been analyzed by [OVRR2013], but the main contribution of this paper is to analyze PSRL for MDPs with general state and action space. In the analysis, the authors use the concept of eluder dimension introduced by [RVR2013]. Eluder dimension was previously used in the analysis of bandit problems (for both Thompson Sampling and the Optimism in Face of Uncertainty (OFU) approaches).