Goto

Collaborating Authors

 structured conditional probability matrix


Near-Optimal Smoothing of Structured Conditional Probability Matrices

Neural Information Processing Systems

Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning low-rank conditional probability matrices under expected KL-risk. This choice makes smoothing, that is the careful handling of low-probability elements, paramount. We derive an iterative algorithm that extends classical non-negative matrix factorization to naturally incorporate additive smoothing and prove that it converges to the stationary points of a penalized empirical risk. We then derive sample-complexity bounds for the global minimizer of the penalized risk and show that it is within a small factor of the optimal sample complexity.


Near-Optimal Smoothing of Structured Conditional Probability Matrices

Neural Information Processing Systems

Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning low-rank conditional probability matrices under expected KL-risk. This choice makes smoothing, that is the careful handling of low-probability elements, paramount. We derive an iterative algorithm that extends classical non-negative matrix factorization to naturally incorporate additive smoothing and prove that it converges to the stationary points of a penalized empirical risk. We then derive sample-complexity bounds for the global minimizer of the penalized risk and show that it is within a small factor of the optimal sample complexity.


Reviews: Near-Optimal Smoothing of Structured Conditional Probability Matrices

Neural Information Processing Systems

If my understanding is correct, Theorem 1 of the authors does not quite apply to their algorithm ADD-1/2-Smoothed Low-Rank. Instead, it applies to the non-computable algorithm where they assume that they have a minimizer of the objective function in Theorem 3. It is not clear if the alternating optimization algorithm proposed in the paper is guaranteed to converge to a minimizer of the objective in Theorem 3. If this is true, the authors should mention this before stating Theorem 1 to avoid misleading the reader. The "discounting" seems important from the Experiments section but this is not described in the main paper. If this is so important, the authors should make room for this in the main paper. The main results (Theorem 1 and 2) are not so surprising given that this is almost a parametric estimation problem with mk parameters (so the rates should be km/n).


Near-Optimal Smoothing of Structured Conditional Probability Matrices

Neural Information Processing Systems

Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning low-rank conditional probability matrices under expected KL-risk. This choice makes smoothing, that is the careful handling of low-probability elements, paramount. We derive an iterative algorithm that extends classical non-negative matrix factorization to naturally incorporate additive smoothing and prove that it converges to the stationary points of a penalized empirical risk. We then derive sample-complexity bounds for the global minimizer of the penalized risk and show that it is within a small factor of the optimal sample complexity.