AITopics | fk-steering

Collaborating Authors

fk-steering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

Wang, Weixin, Yang, Yu, Deng, Wei, Xu, Pan

arXiv.org Machine LearningMay-26-2026

We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without updating its weights. Recent Sequential Monte Carlo (SMC)-based steering methods approximate reward-tilted target distributions in a principled way, but their proposals remain largely tied to the base sampler. Since reward information is mainly used after propagation through particle reweighting and resampling, these methods can require large particle budgets and suffer from weight degeneracy and high-variance estimates. One way to reduce variance and improve particle efficiency is to iteratively learn twisting functions that provide look-ahead guidance, as in twisted SMC. However, existing learnable twisting methods are developed mainly for classical sequential inference and can be unstable when applied to diffusion-based alignment with high-dimensional state spaces and terminal, noisy, or black-box rewards. We propose Trust-Region Iterative Twisted Sequential Monte Carlo (TRI-TSMC), a trust-region framework for learning twisting functions in SMC-based inference-time alignment. Each iteration computes an exact KL-constrained update in path space, which admits a closed-form solution by tempered importance reweighting, and projects this target back to the parameterized twisted family by weighted maximum likelihood. Theoretically, we formalize the value-function interpretation of the optimal twisting function and show that it yields a zero-variance sampler. We prove that the trust-region update follows an escort path toward the target distribution, that the weighted maximum-likelihood update is a forward-KL projection, and that the path reduces residual importance-weight variance. Empirically, TRI-TSMC improves primary alignment objectives on discrete diffusion text generation and text-to-image generation under matched inference-time budgets.

fk-steering, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.25123

Country: Asia > Middle East (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

Wang, Chenyang, Wang, Weizhong, Ren, Yinuo, Blanchet, Jose, Lu, Yiping

arXiv.org Machine LearningMay-19-2026

Modern generative models have emerged as a powerful Diffusion-based generative models increasingly paradigm for learning complex, high-dimensional data distributions. In particular, diffusion models (Ho et al., 2020; rely on inference-time guidance, adding a drift Sohl-Dickstein et al., 2015; Song and Ermon, 2019; Song term or reweighting mixture of experts, to imet al., 2020) and flow-based methods (Zhang et al., 2018a; prove sample quality on task-specific objectives. However, most existing techniques reLipman et al., 2022; Albergo and Vanden-Eijnden, 2022; Liu quire repeated score or gradient evaluations, inet al., 2022) provide a principled and scalable framework for generative modeling, achieving state-of-the-art performance troducing bias, high computational overhead, or across diverse applications, including video generation (Ho both. We introduce URGE, approximation-free et al., 2022), protein design (Gruver et al., 2023), and largeResampling via Girsanov Estimation, a derivativefree inference-time scaling algorithm that perscale text generation (Li et al., 2022; Nie et al., 2025). A forms pathwise importance reweighting via a Girunifying perspective underlying these approaches is their formulation in terms of stochastic differential equations sanov change of measure.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

2605.1785

Country:

North America > United States (0.93)
Asia (0.67)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.64)

Add feedback

Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models

Yu, Zichao, Li, Ming, Zhang, Wenyi, Gao, Weiguo

arXiv.org Artificial IntelligenceSep-30-2025

Tree search has recently emerged as a powerful framework for aligning generative models with task-specific rewards at test time. Applying tree search to Masked Diffusion Language Models, however, introduces two key challenges: (i) parallel unmasking yields highly correlated branches, limiting exploration, and (ii) reward evaluation via sampled completions produces high-variance estimates, making pruning unstable. Theoretically, we quantify branching efficiency gains in NFEs (number of function evaluations), show that the scoring rule approximates the true reward with error bounded by predictive uncertainty, and prove improvements with larger tree widths. Masked Diffusion Language Models (MDLMs) (Nie et al., 2025; Sahoo et al., 2024; Shi et al., 2024; Y ang et al., 2025b) have emerged as a compelling alternative to autoregressive models (Brown et al., 2020; Radford et al., 2019; Touvron et al., 2023). They start with all-mask tokens and gradually reveal tokens through a sequence of discrete denoising steps. At each step, the model predicts token distributions for masked positions, conditioned on the current partially masked sequence and the diffusion timestep. This formulation enables flexible sampling schedules and broad conditioning patterns, making MDLMs well-suited for controllable generation tasks.Figure 1: Conceptual illustration of TR However, this flexibility is not fully realized without mechanisms to align the model's outputs with user-defined objectives. Test-Time Alignment (TT A) enables guiding language model outputs toward task-specific goals without retraining. In applications such as toxicity avoidance (Logacheva et al., 2022), sentiment control (Barbieri et al., 2020), or enforcing linguistic acceptability (Warstadt et al., 2019), aligning generation with external reward functions at test time offers a flexible and training-free alternative to supervised fine-tuning.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.23146

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback