Optimization
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The paper describes an efficient optimization approach to find structured low-rank matrices. The structure is encoded by a linear map and enforcing low rank is achieved by adding to the cost function the nuclear norm of the structured matrix. The cost function is optimized with a generalized conditional gradient algorithm. By using a factorization of the large structured matrix the optimization is accelerated further.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper presents a new technique for solving MDPs. The new technique, presented as an alternative to approximate policy/value iteration, consists in directly minimizing the Optimal Bellman Residual (OBR). The authors first motivate their method by showing that the loss bound of OBR is often tighter than the loss bound of policy/value iteration, which is a known result [9,15]. The authors then show that an empirical estimate of OBR is consistent in the Vapnick sense, i.e. minimizing the empirical OBR is equivalent to minimizing an upper bound on the true OBR, which is unknown when the MDP model is unknown. Finally, the authors show that OBR can be decomposed into a difference of two convex functions, and a standard Difference of Convex Functions (DC) optimization method can be used for finding a local optimum.