Beyond MLE: Convex Learning for Text Generation

Oct-10-2024, 05:55:29 GMT–Neural Information Processing Systems

Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution that best explain the observed data. In the context of text generation, MLE is often used to train generative language models, which can then be used to generate new text. However, we argue that MLE is not always necessary and optimal, especially for closed-ended text generation tasks like machine translation. In these tasks, the goal of model is to generate the most appropriate response, which does not necessarily require it to estimate the entire data distribution with MLE. To this end, we propose a novel class of training objectives based on convex functions, which enables text generation models to focus on highly probable outputs without having to estimate the entire data distribution.

convex function, convex learning, text generation, (5 more...)

Neural Information Processing Systems

Oct-10-2024, 05:55:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.61)
  - Machine Learning > Learning Graphical Models
    - Directed Networks > Bayesian Learning (0.61)