mep
Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
Di Carlo, Luca, Goddard, Chase, Schwab, David J.
Modern neural networks exhibit a striking property: basins of attraction in the loss landscape are often connected by low-loss paths, yet optimization dynamics generally remain confined to a single convex basin (Baity-Jesi et al., 2019; Juneja et al., 2023) and rarely explore intermediate points. We resolve this paradox by identifying entropic barriers arising from the interplay between curvature variations along these paths and noise in optimization dynamics. Empirically, we find that curvature systematically rises away from minima, producing effective forces that bias noisy dynamics back toward the endpoints -- even when the loss remains nearly flat. These barriers persist longer than energetic barriers, shaping the late-time localization of solutions in parameter space. Our results highlight the role of curvature-induced entropic forces in governing both connectivity and confinement in deep learning landscapes. Deep neural networks trained, in the overparametrized regime, exhibit a number of surprising and counterintuitive properties. One of the most striking is the observation that distinct solutions, found with standard optimization algorithms, are often connected by low-loss paths in parameter space (Garipov et al., 2018; Draxler et al., 2018; Frankle et al., 2020). Such mode connectivity results imply that the landscape is far less rugged than once assumed: minima that appear isolated are, in fact, linked by paths of low, nearly constant loss. At the same time, however, optimization dynamics display a seemingly contradictory behavior.
High-Resolution Probabilistic Data-Driven Weather Modeling with a Stretched-Grid
Nordhagen, Even Marius, Haugen, Hรฅvard Homleid, Salihi, Aram Farhad Shafiq, Ingstad, Magnus Sikora, Nipen, Thomas Nils, Seierstad, Ivar Ambjรธrn, Frogner, Inger-Lise, Clare, Mariana, Lang, Simon, Chantry, Matthew, Dueben, Peter, Kristiansen, Jรธrn
We present a probabilistic data-driven weather model capable of providing an ensemble of high spatial resolution realizations of 87 variables at arbitrary forecast length and ensemble size. The model uses a stretched grid, dedicating 2.5 km resolution to a region of interest, and 31 km resolution elsewhere. Based on a stochastic encoder-decoder architecture, the model is trained using a loss function based on the Continuous Ranked Probability Score (CRPS) evaluated point-wise in real and spectral space. The spectral loss components is shown to be necessary to create fields that are spatially coherent. The model is compared to high-resolution operational numerical weather prediction forecasts from the MetCoOp Ensemble Prediction System (MEPS), showing competitive forecasts when evaluated against observations from surface weather stations. The model produced fields that are more spatially coherent than mean squared error based models and CRPS based models without the spectral component in the loss.
Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis
A common assumption in probabilistic generative models for image generation is that learning the global data distribution suffices to generate novel images via sampling. We investigate the limitation of this core assumption, namely that learning global distributions leads to memorization rather than generative behavior. We propose two theoretical frameworks, the Mutually Exclusive Probability Space (MEPS) and the Local Dependence Hypothesis (LDH), for investigation. MEPS arises from the observation that deterministic mappings (e.g. We further propose a lower bound in terms of the overlap coefficient, and introduce a Binary Latent Autoencoder (BL-AE) that encodes images into signed binary latent representations. LDH formalizes dependence within a finite observation radius, which motivates our ฮณ- Autoregressive Random V ariable Model (ฮณ-ARVM). Using ฮณ-ARVM, we observe that as the observation range increases, autoregressive models progressively shift toward memorization. In the limit of global dependence, the model behaves as a pure memorizer when operating on the binary latents produced by our BL-AE. Comprehensive experiments and discussions support our investigation. Figure 1: Selecting images for values in the overlap range is ambiguous. Probabilistic generative models such as V aria-tional Autoencoders (V AEs), Generative Adversarial Networks (GANs), diffusion models, and autoregressive models have achieved remarkable progress in image generation. A core assumption is that these models learn an image distribution from which new images can be generated via sampling (Bond-Taylor et al., 2022). Specifically, we focus on au-toregressive models. For this investigation, we introduce two theoretical frameworks.
Follow the MEP: Scalable Neural Representations for Minimum-Energy Path Discovery in Molecular Systems
Petersen, Magnus, Roig, Gemma, Covino, Roberto
Characterizing conformational transitions in physical systems remains a fundamental challenge, as traditional sampling methods struggle with the high-dimensional nature of molecular systems and high-energy barriers between stable states. These rare events often represent the most biologically significant processes, yet may require months of continuous simulation to observe. One way to understand the function and mechanics of such systems is through the minimum energy path (MEP), which represents the most probable transition pathway between stable states in the high-friction, low-temperature limit. We present a method that reformulates MEP discovery as a fast and scalable neural optimization problem. By representing paths as implicit neural representations and training with differentiable molecular force fields, our method discovers transition pathways without expensive sampling. Our approach scales to large biomolecular systems through a simple loss function derived from the path's likelihood via the Onsager-Machlup action and a scalable new architecture, AdaPath. We demonstrate this approach on two proteins, including an explicitly hydrated BPTI system with more than 3,500 atoms. Our method identifies a MEP that captures the same conformational change observed in a millisecond-scale molecular dynamics (MD) simulation in just minutes on a standard GPU, rather than weeks on a specialized cluster.
Benchmarking Gender and Political Bias in Large Language Models
Yang, Jinrui, Han, Xudong, Baldwin, Timothy
We introduce EuroParlVote, a novel benchmark for evaluating large language models (LLMs) in politically sensitive contexts. It links European Parliament debate speeches to roll-call vote outcomes and includes rich demographic metadata for each Member of the European Parliament (MEP), such as gender, age, country, and political group. Using EuroParlVote, we evaluate state-of-the-art LLMs on two tasks -- gender classification and vote prediction -- revealing consistent patterns of bias. We find that LLMs frequently misclassify female MEPs as male and demonstrate reduced accuracy when simulating votes for female speakers. Politically, LLMs tend to favor centrist groups while underperforming on both far-left and far-right ones. Proprietary models like GPT-4o outperform open-weight alternatives in terms of both robustness and fairness. We release the EuroParlVote dataset, code, and demo to support future research on fairness and accountability in NLP within political contexts.
Transferable Learning of Reaction Pathways from Geometric Priors
Nam, Juno, Steiner, Miguel, Misterka, Max, Yang, Soojung, Singhal, Avni, Gรณmez-Bombarelli, Rafael
Identifying minimum-energy paths (MEPs) is crucial for understanding chemical reaction mechanisms but remains computationally demanding. We introduce MEPIN, a scalable machine-learning method for efficiently predicting MEPs from reactant and product configurations, without relying on transition-state geometries or pre-optimized reaction paths during training. The task is defined as predicting deviations from geometric interpolations along reaction coordinates. We address this task with a continuous reaction path model based on a symmetry-broken equivariant neural network that generates a flexible number of intermediate structures. The model is trained using an energy-based objective, with efficiency enhanced by incorporating geometric priors from geodesic interpolation as initial interpolations or pre-training objectives. Our approach generalizes across diverse chemical reactions and achieves accurate alignment with reference intrinsic reaction coordinates, as demonstrated on various small molecule reactions and [3+2] cycloadditions. Our method enables the exploration of large chemical reaction spaces with efficient, data-driven predictions of reaction pathways.
EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks
Feng, Tongtong, Wang, Xin, Zhou, Zekai, Wang, Ren, Zhan, Yuwei, Li, Guangyao, Li, Qing, Zhu, Wenwu
Completing Long-Horizon (LH) tasks in open-ended worlds is an important yet difficult problem for embodied agents. Existing approaches suffer from two key challenges: (1) they heavily rely on experiences obtained from human-created data or curricula, lacking the ability to continuously update multimodal experiences, and (2) they may encounter catastrophic forgetting issues when faced with new tasks, lacking the ability to continuously update world knowledge. To solve these challenges, this paper presents EvoAgent, an autonomous-evolving agent with a continual World Model (WM), which can autonomously complete various LH tasks across environments through self-planning, self-control, and self-reflection, without human intervention. Our proposed EvoAgent contains three modules, i.e., i) the memory-driven planner which uses an LLM along with the WM and interaction memory, to convert LH tasks into executable sub-tasks; ii) the WM-guided action controller which leverages WM to generate low-level actions and incorporates a self-verification mechanism to update multimodal experiences; iii) the experience-inspired reflector which implements a two-stage curriculum learning algorithm to select experiences for task-adaptive WM updates. Moreover, we develop a continual World Model for EvoAgent, which can continuously update the multimodal experience pool and world knowledge through closed-loop dynamics. We conducted extensive experiments on Minecraft, compared with existing methods, EvoAgent can achieve an average success rate improvement of 105% and reduce ineffective actions by more than 6x.
Increasing transformer token length with a Maximum Entropy Principle Method
Transformers suffer from the computational overhead of their quadratic dependence on the length of sequences processed. We present three methods, all adding an intermediate step between training and inference/generation, which extend the autoregressive length of transformers. All rely on a Maximum Entropy Principle (MEP) whereby entropy is maximized in the presence of suitable constraints, accounted for by use of Lagrange Multipliers. These constraint methods extend the autoregressive character from T to 2T tokens in a linear-with-T fashion. There is overhead associated with this added step, but they should still be faster than the standard methods.