Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction
–Neural Information Processing Systems
When used to learn high dimensional parametric probabilistic models, the classical maximum likelihood (ML) learning often suffers from computational intractability, which motivates the active developments of non-ML learning methods. Yet, because of their divergent motivations and forms, the objective functions of many non-ML learning methods are seemingly unrelated, and there lacks a unified framework to understand them. In this work, based on an information geometric view of parametric learning, we introduce a general non-ML learning principle termed as minimum KL contraction, where we seek optimal parameters that minimizes the contraction of the KL divergence between the two distributions after they are transformed with a KL contraction operator. We then show that the objective functions of several important or recently developed non-ML learning methods, including contrastive divergence [12], noise-contrastive estimation [11], partial likelihood [7], non-local contrastive objectives [31], score matching [14], pseudo-likelihood [3], maximum conditional likelihood [17], maximum mutual information [2], maximum marginal likelihood [9], and conditional and marginal composite likelihood [24], can be unified under the minimum KL contraction framework with different choices of the KL contraction operators.
Neural Information Processing Systems
Mar-15-2024, 07:57:43 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > Spain
- Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States
- New York (0.04)
- South America > Paraguay
- Asia > Middle East