Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction

Mar-15-2024, 07:57:43 GMT–Neural Information Processing Systems

When used to learn high dimensional parametric probabilistic models, the classical maximum likelihood (ML) learning often suffers from computational intractability, which motivates the active developments of non-ML learning methods. Yet, because of their divergent motivations and forms, the objective functions of many non-ML learning methods are seemingly unrelated, and there lacks a unified framework to understand them. In this work, based on an information geometric view of parametric learning, we introduce a general non-ML learning principle termed as minimum KL contraction, where we seek optimal parameters that minimizes the contraction of the KL divergence between the two distributions after they are transformed with a KL contraction operator. We then show that the objective functions of several important or recently developed non-ML learning methods, including contrastive divergence [12], noise-contrastive estimation [11], partial likelihood [7], non-local contrastive objectives [31], score matching [14], pseudo-likelihood [3], maximum conditional likelihood [17], maximum mutual information [2], maximum marginal likelihood [9], and conditional and marginal composite likelihood [24], can be unified under the minimum KL contraction framework with different choices of the KL contraction operators.

contraction operator, kl contraction operator, operator, (13 more...)

Neural Information Processing Systems

Mar-15-2024, 07:57:43 GMT

Conferences PDF

Add feedback

Country:
- South America > Paraguay
  - Asunción > Asunción (0.04)
- North America > United States
  - New York (0.04)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East
  - Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (1.00)
  - Machine Learning > Learning Graphical Models
    - Directed Networks > Bayesian Learning (1.00)
    - Undirected Networks > Markov Models (0.68)