Discrete Tilt Matching

Chen, Yuyuan, Wang, Shiyi, Potaptchik, Peter, Kim, Jaeyeon, Albergo, Michael S.

Apr-22-2026–arXiv.org Machine Learning

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

large language model, machine learning, urlhttp, (18 more...)

arXiv.org Machine Learning

Apr-22-2026

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Ontario > Toronto (0.04)
  - British Columbia > Metro Vancouver Regional District
    - Vancouver (0.04)
- Europe > Austria
  - Vienna (0.14)

Genre:
- Research Report (0.42)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning (0.93)
  - Natural Language > Large Language Model (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found