energy function
ForceFM: Enhancing Protein-Ligand Predictions through Force-Guided Flow Matching
Molecular docking is a fundamental technique in structure-based drug discovery, playing a critical role in predicting the binding poses of protein-ligand complexes. While traditional docking methods are generally reliable, they are often computationally expensive. Recent deep learning (DL) approaches have substantially accelerated docking and improved prediction accuracy; however, they frequently generate conformations that lack physical plausibility due to insufficient integration of physical priors. To deal with these challenges, we propose ForceFM, a novel force-guided model that integrates a force-guided network into the generation process, steering ligand poses toward low-energy, physically realistic conformations. Force guidance also halves inference cost compared with the unguided approaches. Importantly, replacing the guiding potential with diverse energy functions-including Vina, Glide, Gnina, and Confscore-preserves or improves performance, underscoring the method's generality and robustness. These results highlight ForceFM's ability to set new standards in docking accuracy and physical consistency, surpassing the limitations of previous methods.
Dense Associative Memory with Energy
We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant additional emergent local minima while preserving perfect pattern recovery -- a characteristic previously unseen in DenseAM literature. Empirical results show that LSR energy has significantly more local minima (memories) that have comparable log-likelihood to LSE-based models. Analysis of LSR's emergent memories on image datasets reveals a degree of creativity and novelty, hinting at this method's potential for both large-scale memory storage and generative tasks.
Generalizable Reasoning through Compositional Energy Minimization
Generalization is a key challenge in machine learning, specifically in reasoning tasks, where models are expected to solve problems more complex than those encountered during training. Existing approaches typically train reasoning models in an end-to-end fashion, directly mapping input instances to solutions. While this allows models to learn useful heuristics from data, it often results in limited generalization beyond the training distribution. In this work, we propose a novel approach to reasoning generalization by learning energy landscapes over the solution spaces of smaller, more tractable subproblems. At test time, we construct a global energy landscape for a given problem by combining the energy functions of multiple subproblems. This compositional approach enables the incorporation of additional constraints during inference, allowing the construction of energy landscapes for problems of increasing difficulty. To improve the sample quality from this newly constructed energy landscape, we introduce Parallel Energy Minimization (PEM). We evaluate our approach on a wide set of reasoning problems. Our method outperforms existing state-of-the-art methods, demonstrating its ability to generalize to larger and more complex problems.
Exponential Dynamic Energy Network for High Capacity Sequence Memory
The energy paradigm, exemplified by Hopfield networks, offers a principled framework for memory in neural systems by interpreting dynamics as descent on an energy surface. While powerful for static associative memories, it falls short in modeling sequential memory, where transitions between memories are essential. We introduce the Exponential Dynamic Energy Network (EDEN), a novel architecture that extends the energy paradigm to temporal domains by evolving the energy function over multiple timescales. EDEN combines a static high-capacity energy network with a slow, asymmetrically interacting modulatory population, enabling robust and controlled memory transitions. We formally derive short-timescale energy functions that govern local dynamics and use them to analytically compute memory escape times, revealing a phase transition between static and dynamic regimes. The analysis of capacity, defined as the number of memories that can be stored with minimal error rate as a function of the dimensions of the state space (number of feature neurons), for EDEN shows that it achieves exponential sequence memory capacity O(γN), outperforming the linear capacity O(N) of conventional models. Furthermore, EDEN's dynamics resemble the activity of time and ramping cells observed in the human brain during episodic memory tasks, grounding its biological relevance. By unifying static and sequential memory within a dynamic energy framework, EDEN offers a scalable and interpretable model for high-capacity temporal memory in both artificial and biological systems.
ADifference-of-Convex Functions Approach to Energy-Based Iterative Reasoning
While energy-based models have recently proven to be a powerful framework for learning to reason with neural networks, their practical application is still limited by computational cost. That is, existing methods for energy-based iterative reasoning suffer from computational bottlenecks by relying on expensive optimization routines during training and especially during inference.
Adjoint Schrödinger Bridge Sampler
Computational methods for learning to sample from the Boltzmann distribution-- where the target distribution is known only up to an unnormalized energy function-- have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as diffusion samplers, often require importance-weighted estimation or complicated learning processes.
Exponential Dynamic Energy Network for High Capacity Sequence Memory
The energy paradigm, exemplified by Hopfield networks, offers a principled framework for memory in neural systems by interpreting dynamics as descent on an energy surface. While powerful for static associative memories, it falls short in modeling sequential memory, where transitions between memories are essential. We introduce the Exponential Dynamic Energy Network (EDEN), a novel architecture that extends the energy paradigm to temporal domains by evolving the energy function over multiple timescales. EDEN combines a static high-capacity energy network with a slow, asymmetrically interacting modulatory population, enabling robust and controlled memory transitions. We formally derive short-timescale energy functions that govern local dynamics and use them to analytically compute memory escape times, revealing a phase transition between static and dynamic regimes. The analysis of capacity, defined as the number of memories that can be stored with minimal error rate as a function of the dimensions of the state space (number of feature neurons), for EDEN shows that it achieves exponential sequence memory capacity $\mathcal{O}(\gamma^N)$, outperforming the linear capacity $\mathcal{O}(N)$ of conventional models. Furthermore, EDEN's dynamics resemble the activity of time and ramping cells observed in the human brain during episodic memory tasks, grounding its biological relevance. By unifying static and sequential memory within a dynamic energy framework, EDEN offers a scalable and interpretable model for high-capacity temporal memory in both artificial and biological systems.
Adjoint Schrödinger Bridge Sampler
Computational methods for learning to sample from the Boltzmann distribution--where the target distribution is known only up to an unnormalized energy function--have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as, often require importance-weighted estimation or complicated learning processes.
Detecting Metastable Basins in High Dimensions via Marginal Trajectory Distribution Discrimination
We study the problem of identifying dynamically distinct basins of attraction in high dimensional time-homogeneous Markov processes using only trajectory sampling. This problem is fundamental in the analysis of metastable dynamical systems, where the process rapidly mixes within basins while transitions between basins occur rarely on the timescale of interest, or even when the state space is reducible. Existing approaches typically rely on spatial discretization or spectral analysis of estimated transition operators, which can become unreliable in high dimensional settings or when the underlying basin geometry is highly nonlinear. We propose a discriminative approach to basin identification based on marginal trajectory distribution comparison. We prove a simple risk separation result: if two initial states belong to the same basin, the Bayes-optimal classifier distinguishing their marginal trajectory distributions achieves risk close to 1/2, whereas if they lie in distinct basins, the optimal risk is close to zero. This observation reduces basin detection to a two-sample discrimination problem between marginal trajectory distributions. Motivated by this principle, we develop a neural algorithm that receives a set of candidate basin representatives and iteratively merges them by estimating classification risk with a neural network that approximates the Bayes classifier. We evaluate the method on various metastable systems. These include synthetic systems constructed by embedding low-dimensional dynamics into high dimensional noisy ambient spaces. In these settings, standard spectral and clustering-based methods often fail, while our approach accurately recovers the underlying basin structure. These results display a shortcoming of existing methods and highlight trajectory discrimination as an effective tool for identifying dynamical basins in high dimensional stochastic systems.
Revisiting Transformer Layer Parameterization Through Causal Energy Minimization
Xu, Jin, Couturier, Camille, Rühle, Victor, Rajmohan, Saravan, Hensman, James
Transformer blocks typically combine multi-head attention (MHA) for token mixing with gated MLPs for token-wise feature transformation, yet many choices in their parameterization remain largely empirical. We introduce Causal Energy Minimization (CEM), a framework that recasts Transformer layers as optimization steps on conditional energy functions while explicitly accounting for layer parameterization. Extending prior energy-based interpretations of attention, CEM shows that weight-tied MHA can be derived as a gradient update on an interaction energy, and that a gated MLP with shared up/down projections can be viewed through an element-wise energy. This perspective identifies a design space for Transformer layers that includes within-layer weight sharing, diagonal-plus-low-rank interactions, lightweight preconditioners, and recursive updates. We evaluate CEM-derived layers in language-modeling experiments at the moderate hundred-million-parameter scale. Despite their constrained parameterizations, these layers train stably and can match corresponding Transformer baselines. Overall, our results suggest that CEM provides a useful lens for understanding Transformer layer parameterization, connecting Transformer architectures to energy-based models and motivating further exploration of energy-guided layer designs.