Goto

Collaborating Authors

 Optimization



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The paper presents a sample-efficient policy search algorithm for large, continuous reinforcement learning problems. In contrast to existing model-based policy search algorithms, the approach presented in this paper tries to learn local models in form of linear Gaussian controllers. Given the information (rollouts) from these linear local models, a global, nonlinear policy can then be learned using an arbitrary parametrization scheme. The so-called Guided Policy Search approach alternates between (local) trajectory optimization and (global) policy search in an iterative fashion. In their experiments, the authors show that the approach outperforms various state-of-the-art Policy Search methods, e.g., REPS, PILCO etc. Experiments where conducted in (mostly 2D) dynamics simulations involving the continuous control of multi-linked agents.





GENO -- GENeric Optimization for Classical Machine Learning

Neural Information Processing Systems

Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper develops a new method of performing blind source separation, by formulating the problem as an additive factorial HMM (AFHMM), and then applying signal aggregate constraints (SACs). The motivation behind this is that additional domain knowledge can be incorporated to improve the separation of the time series into components. The example used throughout the paper is energy disaggregation, where the components of domestic energy use (relating to individual appliances) can be better separated, when information relating to total (expected) usage of each appliance in a time period is incorporated. The objective function that is maximized to perform the separation (which is the log of the posterior distribution of the hidden chains given the observed data) is then transformed into a convex optimization problem.