Goto

Collaborating Authors

 Energy


Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning

arXiv.org Machine Learning

Differentiable optimization layers are traditionally integrated in predict-then-optimize frameworks where a neural model estimates parameters that subsequently serve as fixed inputs to downstream decision-making optimization problems. In this work, we introduce the concept of a "fairness layer": a differentiable optimization layer appended to a model's output layer that guarantees a chosen notion of output parity is satisfied when integrated into a neural network. Additionally, we introduce an online primal-dual inference algorithm that provides provable aggregate fairness guarantees for streaming predictions with arbitrarily small batch sizes, where traditional per-batch constraints become overly restrictive. Numerical experiments demonstrate the effectiveness of the fairness layer and associated algorithm, and theoretical analysis characterizes the layer's differentiability and stability properties during model training and backpropagation. Our code for these experiments is publicly available on GitHub (https://github.com/dtroxell19/FairDL-ICML-2026.git) and our public Python package documentation can be found online: https://dtroxell19.github.io/fairness_training/.


Training Infinitely Deep and Wide Transformers

arXiv.org Machine Learning

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.


Online Conformal Prediction for Non-Exchangeable Panel Data

arXiv.org Machine Learning

Panel data, in which multiple units are repeatedly observed over time, arise throughout science and engineering. Quantifying predictive uncertainty in such settings is challenging because conformal prediction, while distribution-free and model-agnostic, classically relies on exchangeability assumptions that fail under temporal dependence and unit heterogeneity. We propose a simple online conformal framework for non-exchangeable panel data. The method exploits a key feature of online panel prediction: when a forecast is required for one unit, contemporaneous outcomes from related units may already be observed and can serve as a calibration panel. At each round, prediction sets are formed using currently observed calibration units together with two adaptive quantities: history-based similarity weights that emphasize calibration units resembling the target, and an adaptive miscoverage level that is updated whenever target feedback is revealed. This two-state design yields a stepwise coverage bound and a long-run coverage guarantee. Empirically, across synthetic and real panel data sets, the method improves coverage on the worst-covered target units through adaptive interval-width allocation rather than uniform inflation. The two states are complementary: similarity weights protect coverage when target feedback is sparse, while the adaptive level further improves coverage as feedback accumulates.


Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

arXiv.org Machine Learning

Modern generative models have emerged as a powerful Diffusion-based generative models increasingly paradigm for learning complex, high-dimensional data distributions. In particular, diffusion models (Ho et al., 2020; rely on inference-time guidance, adding a drift Sohl-Dickstein et al., 2015; Song and Ermon, 2019; Song term or reweighting mixture of experts, to imet al., 2020) and flow-based methods (Zhang et al., 2018a; prove sample quality on task-specific objectives. However, most existing techniques reLipman et al., 2022; Albergo and Vanden-Eijnden, 2022; Liu quire repeated score or gradient evaluations, inet al., 2022) provide a principled and scalable framework for generative modeling, achieving state-of-the-art performance troducing bias, high computational overhead, or across diverse applications, including video generation (Ho both. We introduce URGE, approximation-free et al., 2022), protein design (Gruver et al., 2023), and largeResampling via Girsanov Estimation, a derivativefree inference-time scaling algorithm that perscale text generation (Li et al., 2022; Nie et al., 2025). A forms pathwise importance reweighting via a Girunifying perspective underlying these approaches is their formulation in terms of stochastic differential equations sanov change of measure.


Geometric Dictionary Learning of Dynamical Systems with Optimal Transport

arXiv.org Machine Learning

Learning dynamical systems through operator-theoretic representations provides a powerful framework for analyzing complex dynamics, as spectral quantities such as eigenvalues and invariant structures encode characteristic time scales and long-term behavior. However, dynamical operators are typically estimated independently for each system, preventing the discovery of shared structure across related dynamics. To address this limitation, we posit that related dynamical systems lie near a low-dimensional manifold in spectral operator space. Based on this hypothesis, we introduce DOODL (Dynamical OperatOr Dictionary Learning), a framework that learns a dictionary of characteristic spectral dynamics whose combinations approximate this manifold and yield compact, interpretable embeddings of individual systems. Beyond representation learning, DOODL enables fast and interpretable operator estimation from short and partially observed trajectories by constraining the estimation to the learned operator manifold. Experiments on metastable Langevin dynamics and turbulent plasma simulations demonstrate that DOODL scales to highly complex multiscale regimes while capturing characteristic spectral structure governing the dynamics rather than merely fitting trajectories, achieving errors one to two orders of magnitude lower than independent operator estimation methods in challenging low-data regimes.


Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations

arXiv.org Machine Learning

The functional ANOVA, or Hoeffding decomposition, provides a principled framework for interpretability by decomposing a model prediction into main effects and higher-order interactions. For independent inputs, this classical decomposition is explicit. It is closely connected to SHAP values, generalized additive models, and orthogonal polynomial expansions, and therefore constitutes a fundamental tool for additive explainability. In the more general and realistic dependent setting, however, obtaining a tractable representation and estimating the decomposition from data remain challenging. In this work, we address this problem for continuous inputs. By combining Hilbert space methods with the generalized functional ANOVA, we build an explicit decomposition Riesz Basis allowing to easily compute the decomposition. Our formulation recovers the classical independent case and its associated orthogonal decomposition. Building on this representation, we propose a simple but mighty algorithm to estimate the decomposition from a data sample in a model-agnostic setting and we compare it empirically with several state-of-the-art explanation methods, demonstrating the power of the approach.


NextEra, Dominion to create huge power biz as AI drives US energy demand

Al Jazeera

NextEra Energy is seeking to acquire Dominion Energy in an all-stock deal valued at about $67bn, creating a massive power company as the energy needs of artificial intelligence (AI) drive demand higher in the United States. It is one of the biggest proposed mergers so far this year and would create the world's largest regulated electric utility business by market capitalisation, the companies said on Monday. The region has a fast-growing population and the world's biggest data centre hub, which is in Virginia. The deal will enable a swifter build-out of power infrastructure to deliver electricity to data centres proposing to connect to NextEra and Dominion, which total about 130 gigawatts of electricity demand, the companies' executives said. One gigawatt can power about 750,000 homes. The merger builds on NextEra's efforts to tap into surging demand for supplying electricity to data centres developed by Big Tech, largely for training and rolling out AI technologies.


What is the UAE's Barakah nuclear plant, nearly hit by a drone?

Al Jazeera

Will Gulf states join war? What is the UAE's Barakah nuclear plant, nearly hit by a drone? A drone attack that caused a fire close to the Barakah Nuclear Energy Plant in the United Arab Emirates has raised further concerns about nuclear security and military escalation in the Gulf as discussions of peace between Iran and the United States hang in the balance. Barakah was the first nuclear power station to be built on the Arabian Peninsula. What is the Barakah Nuclear Energy Plant? Barakah is a nuclear energy plant located in Al Dhafra, the largest municipal region of the emirate of Abu Dhabi.


The Download: Musk v. Altman week 3, and Trump's tech trading

MIT Technology Review

Musk v. Altman week 3: Musk and Altman traded blows over each other's credibility. Now the jury will pick a side. In the final week of the Musk v. Altman trial, lawyers attacked the credibility of the two tech leaders. Sam Altman was accused of lying and self-dealing, while Elon Musk was portrayed as a power-seeker trying to control artificial general intelligence. The case unearthed new details about the two arch-rivals and OpenAI's contested nonprofit status, as well as a golden trophy of a donkey's ass awarded to an employee who challenged Musk. Michelle Kim, who's also a lawyer, has been in court throughout the Musk v. Altman trial.


Iran war live: Trump threatens Tehran; Saudi, UAE report drone attacks

Al Jazeera

Could the war trigger a hunger crisis? How well do you know Iran? This video may contain light patterns or images that could trigger seizures or cause discomfort for people with visual sensitivities. US President Donald Trump warns Iran that the "clock is ticking" for a peace deal to be reached with Washington. Saudi Arabia says it intercepted three drones, as the UAE reported a separate drone strike near its Barakah nuclear power plant that sparked a fire.