AITopics

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Instructional Material (0.66)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJun-14-2026, 20:08:55 GMT

10b7e27c8eb9571fbbd2ae6a9f8c3855-Paper-Conference.pdf

While class of methods generati e v xist e models for aligning - with flo human w matching preferences, models existing - a popular approaches and eff f ecti ail v to e achieve both adaptation efficiency and probabilistically sound prior preservation. In this work, we leverage the theory of optimal control and propose VGG-Flow, a gradient-matching-based method for finetuning pretrained flow matching models. The finetuned key idea velocity behind field this and algorithm the pretrained is that one the should optimal be matched difference with between the gradient the field of a value function. This method not only incorporates first-order information from the reward model but also benefits from heuristic initialization of the value function to enable fast adaptation. Empirically, we show on a popular text-toimage matching flow models matching under model, limited Stable computational Diffusion 3, b that udgets our while method achie can ving finetune effecti flo v w e and prior-preserving alignment.

artificial intelligence, deep learning, machine learning, (19 more...)

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bhattacharyya, Riddhiman, Chakrabarty, Sayak, Banerjee, Imon

Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity

arXiv.org Machine LearningMay-6-2026

Contextual MDPs are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDPs. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDPs; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of $T$-estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a contextual MDP and use it to derive oracle risk bounds under two distinct, but nevertheless meaningful, loss functions. We then consider the problem of determining the optimal control with the aid of the aforementioned density estimate and provide finite sample guarantees for the cost function.

artificial intelligence, assumption, machine learning, (18 more...)

2605.03393

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

arXiv.org Machine LearningMay-4-2026

Mean-Field Path-Integral Diffusion: From Samples to Interacting Agents

Chertkov, Michael

Independent sample generation is the prevailing paradigm in modern diffusion-based generative models of AI. We ask a different question: can samples coordinate through shared population statistics to transport probability mass more efficiently? We introduce Mean-Field Path-Integral Diffusion (MF-PID), a framework in which samples are promoted to interacting agents whose drift depends self-consistently on the evolving population density. We identify two analytically tractable regimes: a Linear-Quadratic-Gaussian (LQG) benchmark in which the infinite-dimensional mean-field system reduces to a finite set of Riccati and linear ODEs, and a Gaussian-mixture regime governed by a piecewise-constant protocol that preserves closed-form solvability. For a quadratic interaction potential with schedule βt and zero base drift we prove that the self-consistent MF guidance is the exact linear interpolant between initial and target global means -- a result that holds for arbitrary initial and target densities and any βt. Applied to demand-response control of energy systems, where agents aggregated into an ensemble are energy consumers (e.g. The energy saving is independent of the number of zones per building (d = 1-32 tested), confirming that the linear guidance formula broadcasts a single d-vector with O(d) communication and grows mildly in compute (sub-cubically for d 32, asymptotically O(d3) for d 1). Introduction Generative AI has been transformed by diffusion models, which frame sample generation as a stochastic process steered from noise to data [1-3]. A key structural feature of these models -- shared with other generative models, e.g. Similarly, stochastic optimal transport (SOT) and Schrödinger bridge formulations [6-8] cast distribution matching as an independent-particle path optimization, yielding tractable convolutions of Green functions but discarding inter-particle information; stochastic interpolants [9] construct flexible transport bridges between arbitrary densities via tunable continuous-time stochastic processes, recovering the Schrödinger bridge as a special limit -- again in an independent-particle framework.

artificial intelligence, machine learning, natural language, (18 more...)

2605.00007

Country: North America > United States > Arizona (0.28)

Genre: Research Report (0.83)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Neural Information Processing SystemsApr-24-2026, 20:15:47 GMT

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Du, Yuanqi, He, Jiajun, Zhang, Dinghuai, Vanden-Eijnden, Eric, Domingo-Enrich, Carles

Rare Event Analysis via Stochastic Optimal Control

arXiv.org Machine LearningApr-16-2026

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

artificial intelligence, equation, machine learning, (18 more...)

2604.13213

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Control Systems (0.90)

Neural Information Processing SystemsMar-22-2026, 12:44:36 GMT

Stochastic Optimal Control Matching

Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest.

artificial intelligence, machine learning, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

arXiv.org Machine LearningMar-18-2026

Population Annealing as a Discrete-Time Schrödinger Bridge

Ohzeki, Masayuki

We present a theoretical framework that reinterprets Population Annealing (PA) through the lens of the discrete-time Schrödinger Bridge (SB) problem. We demonstrate that the heuristic reweighting step in PA is derived by analytically solving the Schrödinger system without iterative computation via instantaneous projection. In addition, we identify the thermodynamic work as the optimal control potential that solves the global variational problem on path space. This perspective unifies non-equilibrium thermodynamics with the geometric framework of optimal transport, interpreting the Jarzynski equality as a consistency condition within the Donsker-Varadhan variational principle, and elucidates the thermodynamic optimality of PA.

algorithm, artificial intelligence, machine learning, (19 more...)

2603.16056

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)