Goto

Collaborating Authors

 variational principle



An Analytic Theory of Quantum Imaginary Time Evolution

Chen, Min, Zhang, Bingzhi, Zhuang, Quntao, Liu, Junyu

arXiv.org Machine Learning

Quantum imaginary time evolution (QITE) algorithm is one of the most promising variational quantum algorithms (VQAs), bridging the current era of Noisy Intermediate-Scale Quantum devices and the future of fully fault-tolerant quantum computing. Although practical demonstrations of QITE and its potential advantages over the general VQA trained with vanilla gradient descent (GD) in certain tasks have been reported, a first-principle, theoretical understanding of QITE remains limited. Here, we aim to develop an analytic theory for the dynamics of QITE. First, we show that QITE can be interpreted as a form of a general VQA trained with Quantum Natural Gradient Descent (QNGD), where the inverse quantum Fisher information matrix serves as the learning-rate tensor. This equivalence is established not only at the level of gradient update rules, but also through the action principle: the variational principle can be directly connected to the geometric geodesic distance in the quantum Fisher information metric, up to an integration constant. Second, for wide quantum neural networks, we employ the quantum neural tangent kernel framework to construct an analytic model for QITE. We prove that QITE always converges faster than GD-based VQA, though this advantage is suppressed by the exponential growth of Hilbert space dimension. This helps explain certain experimental results in quantum computational chemistry. Our theory encompasses linear, quadratic, and more general loss functions. We validate the analytic results through numerical simulations. Our findings establish a theoretical foundation for QITE dynamics and provide analytic insights for the first-principle design of variational quantum algorithms.


How deep is your network? Deep vs. shallow learning of transfer operators

Tabish, Mohammad, Leimkuhler, Benedict, Klus, Stefan

arXiv.org Machine Learning

We propose a randomized neural network approach called RaNNDy for learning transfer operators and their spectral decompositions from data. The weights of the hidden layers of the neural network are randomly selected and only the output layer is trained. The main advantage is that without a noticeable reduction in accuracy, this approach significantly reduces the training time and resources while avoiding common problems associated with deep learning such as sensitivity to hyperparameters and slow convergence. Additionally, the proposed framework allows us to compute a closed-form solution for the output layer which directly represents the eigenfunctions of the operator. Moreover, it is possible to estimate uncertainties associated with the computed spectral properties via ensemble learning. We present results for different dynamical operators, including Koopman and Perron-Frobenius operators, which have important applications in analyzing the behavior of complex dynamical systems, and the Schrödinger operator. The numerical examples, which highlight the strengths but also weaknesses of the proposed framework, include several stochastic dynamical systems, protein folding processes, and the quantum harmonic oscillator.


HalluField: Detecting LLM Hallucinations via Field-Theoretic Modeling

Vu, Minh, Tran, Brian K., Shah, Syed A., Zollicoffer, Geigh, Hoang-Xuan, Nhat, Bhattarai, Manish

arXiv.org Artificial Intelligence

Large Language Models (LLMs) exhibit impressive reasoning and question-answering capabilities. However, they often produce inaccurate or unreliable content known as hallucinations. This unreliability significantly limits their deployment in high-stakes applications. Thus, there is a growing need for a general-purpose method to detect hallucinations in LLMs. In this work, we introduce HalluField, a novel field-theoretic approach for hallucination detection based on a parametrized variational principle and thermodynamics. Inspired by thermodynamics, HalluField models an LLM's response to a given query and temperature setting as a collection of discrete likelihood token paths, each associated with a corresponding energy and entropy. By analyzing how energy and entropy distributions vary across token paths under changes in temperature and likelihood, HalluField quantifies the semantic stability of a response. Hallucinations are then detected by identifying unstable or erratic behavior in this energy landscape. HalluField is computationally efficient and highly practical: it operates directly on the model's output logits without requiring fine-tuning or auxiliary neural networks. Notably, the method is grounded in a principled physical interpretation, drawing analogies to the first law of thermodynamics. Remarkably, by modeling LLM behavior through this physical lens, HalluField achieves state-of-the-art hallucination detection performance across models and datasets.


Manifold Trajectories in Next-Token Prediction: From Replicator Dynamics to Softmax Equilibrium

Lee-Jenkins, Christopher R.

arXiv.org Artificial Intelligence

Decoding in large language models is often described as scoring tokens and normalizing with softmax. We give a minimal, self-contained account of this step as a constrained variational principle on the probability simplex. The discrete, normalization-respecting ascent is the classical multiplicative-weights (entropic mirror) update; its continuous-time limit is the replicator flow. From these ingredients we prove that, for a fixed context and temperature, the next-token distribution follows a smooth trajectory inside the simplex and converges to the softmax equilibrium. This formalizes the common ``manifold traversal'' intuition at the output-distribution level. The analysis yields precise, practice-facing consequences: temperature acts as an exact rescaling of time along the same trajectory, while top-k and nucleus sampling restrict the flow to a face with identical guarantees. We also outline a controlled account of path-dependent score adjustments and their connection to loop-like, hallucination-style behavior. We make no claims about training dynamics or internal representations; those are deferred to future work.



Dynamical symmetries in the fluctuation-driven regime: an application of Noether's theorem to noisy dynamical systems

Vastola, John J.

arXiv.org Artificial Intelligence

Department of Neurobiology, Harvard Medical School, Boston, MA, USA Editors: Simone Azeglio, Christian Shewmake, Bahareh Tolooshams, Sophia Sanborn, Chase van de Geijin, Nina Miolane Abstract Noether's theorem provides a powerful link between continuous symmetries and conserved quantities for systems governed by some variational principle. Perhaps unfortunately, most dynamical systems of interest in neuroscience and artificial intelligence cannot be described by any such principle. On the other hand, nonequilibrium physics provides a variational principle that describes how fairly generic noisy dynamical systems are most likely to transition between two states; in this work, we exploit this principle to apply Noether's theorem, and hence learn about how the continuous symmetries of dynamical systems constrain their most likely trajectories. We identify analogues of the conservation of energy, momentum, and angular momentum, and briefly discuss examples of each in the context of models of decision-making, recurrent neural networks, and diffusion generative models. Keywords: symmetry, invariance, Noether's theorem, stochastic processes, diffusion 1. Introduction In physics, Noether's theorem provides a fundamental link between the symmetries of physical systems on the one hand, and conserved quantities like energy and momentum on the other hand (Noether, 1918; Kosmann-Schwarzbach et al., 2011; Neuenschwander, 2017). In its modern form, it uniquely associates (equivalence classes of) independent continuous symmetries, which can be formalized in terms of Lie groups and algebras, with (equivalence classes of) independent conserved quantities (Martinez Alonso, 1979; Olver, 1986, 1993; Brown, 2020).


Variational formulation based on duality to solve partial differential equations: Use of B-splines and machine learning approximants

Sukumar, N., Acharya, Amit

arXiv.org Artificial Intelligence

Many partial differential equations (PDEs) such as Navier--Stokes equations in fluid mechanics, inelastic deformation in solids, and transient parabolic and hyperbolic equations do not have an exact, primal variational structure. Recently, a variational principle based on the dual (Lagrange multiplier) field was proposed. The essential idea in this approach is to treat the given PDE as constraints, and to invoke an arbitrarily chosen auxiliary potential with strong convexity properties to be optimized. This leads to requiring a convex dual functional to be minimized subject to Dirichlet boundary conditions on dual variables, with the guarantee that even PDEs that do not possess a variational structure in primal form can be solved via a variational principle. The vanishing of the first variation of the dual functional is, up to Dirichlet boundary conditions on dual fields, the weak form of the primal PDE problem with the dual-to-primal change of variables incorporated. We derive the dual weak form for the linear, one-dimensional, transient convection-diffusion equation. A Galerkin discretization is used to obtain the discrete equations, with the trial and test functions chosen as linear combination of either RePU activation functions (shallow neural network) or B-spline basis functions; the corresponding stiffness matrix is symmetric. For transient problems, a space-time Galerkin implementation is used with tensor-product B-splines as approximating functions. Numerical results are presented for the steady-state and transient convection-diffusion equation, and transient heat conduction. The proposed method delivers sound accuracy for ODEs and PDEs and rates of convergence are established in the $L^2$ norm and $H^1$ seminorm for the steady-state convection-diffusion problem.


Variational Principles for Mirror Descent and Mirror Langevin Dynamics

Tzen, Belinda, Raj, Anant, Raginsky, Maxim, Bach, Francis

arXiv.org Artificial Intelligence

Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function. It arises as a basic primitive in a variety of applications, including large-scale optimization, machine learning, and control. This paper proposes a variational formulation of mirror descent and of its stochastic variant, mirror Langevin dynamics. The main idea, inspired by the classic work of Brezis and Ekeland on variational principles for gradient flows, is to show that mirror descent emerges as a closed-loop solution for a certain optimal control problem, and the Bellman value function is given by the Bregman divergence between the initial condition and the global minimizer of the objective function.


How data, synapses and neurons interact with each other: a variational principle marrying gradient ascent and message passing

Huang, Haiping

arXiv.org Machine Learning

Unsupervised learning requiring only raw data is not only a fundamental function of the cerebral cortex, but also a foundation for a next generation of artificial neural networks. However, a unified theoretical framework to treat sensory inputs, synapses and neural activity together is still lacking. The computational obstacle originates from the discrete nature of synapses, and complex interactions among these three essential elements of learning. Here, we propose a variational mean-field theory in which only the distribution of synaptic weight is considered. The unsupervised learning can then be decomposed into two interwoven steps: a maximization step is carried out as a gradient ascent of the lower-bound on the data log-likelihood, and an expectation step is carried out as a message passing procedure on an equivalent or dual neural network whose parameter is specified by the variational parameter of the weight distribution. Therefore, our framework explains how data (or sensory inputs), synapses and neural activities interact with each other to achieve the goal of extracting statistical regularities in sensory inputs. This variational framework is verified in restricted Boltzmann machines with planted synaptic weights and learning handwritten digits.