Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models
Yu, Zichao, Li, Ming, Zhang, Wenyi, Gao, Weiguo
–arXiv.org Artificial Intelligence
Tree search has recently emerged as a powerful framework for aligning generative models with task-specific rewards at test time. Applying tree search to Masked Diffusion Language Models, however, introduces two key challenges: (i) parallel unmasking yields highly correlated branches, limiting exploration, and (ii) reward evaluation via sampled completions produces high-variance estimates, making pruning unstable. Theoretically, we quantify branching efficiency gains in NFEs (number of function evaluations), show that the scoring rule approximates the true reward with error bounded by predictive uncertainty, and prove improvements with larger tree widths. Masked Diffusion Language Models (MDLMs) (Nie et al., 2025; Sahoo et al., 2024; Shi et al., 2024; Y ang et al., 2025b) have emerged as a compelling alternative to autoregressive models (Brown et al., 2020; Radford et al., 2019; Touvron et al., 2023). They start with all-mask tokens and gradually reveal tokens through a sequence of discrete denoising steps. At each step, the model predicts token distributions for masked positions, conditioned on the current partially masked sequence and the diffusion timestep. This formulation enables flexible sampling schedules and broad conditioning patterns, making MDLMs well-suited for controllable generation tasks.Figure 1: Conceptual illustration of TR However, this flexibility is not fully realized without mechanisms to align the model's outputs with user-defined objectives. Test-Time Alignment (TT A) enables guiding language model outputs toward task-specific goals without retraining. In applications such as toxicity avoidance (Logacheva et al., 2022), sentiment control (Barbieri et al., 2020), or enforcing linguistic acceptability (Warstadt et al., 2019), aligning generation with external reward functions at test time offers a flexible and training-free alternative to supervised fine-tuning.
arXiv.org Artificial Intelligence
Sep-30-2025