Goto

Collaborating Authors

 Country


Chelsea flower show garden designers clash over use of AI

The Guardian

Matt Keightley in his 2015 Chelsea garden, designed for Prince Harry. This year he is launching an AI app that has'designed' three full-size gardens for the show. Matt Keightley in his 2015 Chelsea garden, designed for Prince Harry. This year he is launching an AI app that has'designed' three full-size gardens for the show. Wed 13 May 2026 01.00 EDTLast modified on Wed 13 May 2026 01.01 EDT With glasses of champagne sipped among the peonies, Chelsea flower show is generally a friendly and genteel occasion.


Family sues OpenAI, alleging ChatGPT advice led to accidental overdose

Engadget

OpenAI is facing another wrongful death lawsuit . Leila Turner-Scott and Angus Scott filed a lawsuit against the company, alleging that it designed and distributed a defective product that led to the death of their son Sam Nelson from an accidental overdose. Specifically, they're alleging that Sam died following the exact medical advice GPT-4o had provided and approved. In the lawsuit, the plaintiffs described how Sam, a 19-year-old junior at the University of California, Merced, started using ChatGPT in 2023 when he was in high school to help with homework and to troubleshoot computer problems. Sam then started asking the chatbot about safe drug use, but ChatGPT initially refused to answer his question, telling him that it couldn't assist him and warning him that taking drugs can have serious consequences for his health and well-being.


Is Big Brother watching you shop? – podcast

The Guardian

Is Big Brother watching you shop? - podcast From supermarkets to corner shops, live facial recognition could be coming to retailers near you. Live facial recognition is being hailed as a powerful new frontier in the fight against crime, not only by police but by private companies too. Retailers from supermarkets to corner shops hope it will help them fight back against shoplifting. And the technology doesn't always get it right. With more police forces wanting to take up the technology, what could the consequences be?


Elon Musk Had 'Hair-Raising' Idea of Passing OpenAI Onto His Kids, Sam Altman Says

WIRED

Elon Musk Had'Hair-Raising' Idea of Passing OpenAI Onto His Kids, Sam Altman Says Musk's lawyers questioned Altman over allegations of deception and his network of financial investments, but the OpenAI CEO painted a picture of Musk as obsessed with controlling the company. Sam Altman took to the witness stand to defend his reputation in the trial on Tuesday, as Elon Musk's lawyers peppered the OpenAI CEO with hours of questions regarding his alleged history of deceptive behavior . The cross examination was a much needed win for Musk, who has so far struggled to make a convincing case. Tuesday's testimony included several heated exchanges in which the OpenAI CEO had to respond to allegations from former colleagues suggesting he's untrustworthy . Highlighting this evidence is not only important for Musk winning over a jury, but also for beating OpenAI in the court of public opinion.


Testing General Relativity Through Gravitational Wave Classification: A Convolutional Neural Network Framework

arXiv.org Machine Learning

We present a machine learning framework for testing general relativity (GR) with gravitational wave signals from binary black hole mergers. Using the source parameters of 173 BBH events from the GWTC catalog as a realistic astrophysical population, we generate simulated GR waveforms and construct beyond GR (BGR) waveforms by applying controlled phase deformations. We introduce a response function formalism that provides a systematic framework for quantifying how any observable responds to modifications of GR. We train convolutional neural networks (CNNs) on two input representations: whitened waveforms and a response function type observable derived from the waveform mismatch, which isolates the effect of phase deviations from the bulk signal. Using response functions as the CNN input improves the classification sensitivity by a factor of approximately 33 compared to whitened waveforms, demonstrating that the choice of observable representation is as important as the classifier architecture. We study the fundamental limits of this classification through Bayes optimal error analysis, averaging methods that reveal coherent patterns hidden in noise, and a comparison between CNN accuracy and a single feature classifier as a proxy for human performance. At all deformation scales, the CNN outperforms the best single feature approach. We extend the framework to physically motivated theories using the parameterized post Einsteinian (ppE) formalism and apply it to massive gravity, where the classifier detects deviations for graviton masses of order $m_g \sim 10^{-23}\;\mathrm{eV}/c^2$ with aLIGO design sensitivity.


Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks

arXiv.org Machine Learning

Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are parallel translates whose offsets are expressed in batch-standardized coordinates and are independent of the raw bias. This yields an exact criterion for when a switching hyperplane intersects a local $\ell_\infty$ window and motivates a local region-density functional based on exact affine-region counts. Under explicit sufficient conditions, we show that BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and that this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding. These results provide a function-level geometric account of training-time BN as a batch-conditional recentering mechanism near the data.


One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators

arXiv.org Machine Learning

Probabilistic conditioning is concerned with the identification of a distribution of a random variable $X$ given a random variable $Y$. It is a cornerstone of scientific and engineering applications where modeling uncertainty is key. This problem has traditionally been addressed in machine learning by directly learning the conditional distribution of a fixed joint distribution. This paper introduces a novel perspective: we propose to solve the conditioning problem by identifying a single operator that maps any joint density to its conditional, thus amortizing over joint-conditional pairs. We establish that the conditioning operator can be approximated to arbitrary accuracy by neural operators. Our proof relies on new results establishing continuity of the conditioning operator over suitable classes of densities. Finally, we learn the conditioning map for a class of Gaussian mixtures using neural operators, illustrating the promise of our framework. This work provides the theoretical underpinnings for general-purpose, amortized methods for probabilistic conditioning, such as foundation models for Bayesian inference.


HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations

arXiv.org Machine Learning

Neural operators provide fast surrogate models for time-dependent partial differential equations, but their standard autoregressive use usually assumes that the instantaneous field $u(t,\cdot)$ is a complete state. This assumption fails for delay equations, distributed-memory systems, and other non-Markovian dynamics: two trajectories may agree at time $t$ and nevertheless have different futures because their histories differ. We introduce the History-Space Fourier Neural Operator (HS-FNO), a neural operator for delay and memory-driven PDEs formulated on the lifted state $u_t(θ,x)=u(t+θ,x)$, $θ\in[-τ,0]$. The key computational step is to decompose one history-state update into a learned predictor for the newly exposed future slice and an exact shift-append transport for the portion of the history window already known from the previous state. This avoids learning deterministic history coordinates, reduces the learned output dimension, and enforces the natural discrete history update. We test HS-FNO on five benchmark families covering delayed reaction--diffusion, spatial epidemiology, nonlocal neural-field dynamics, delayed waves, and distributed-memory closures. Across ten random seeds, HS-FNO attains the lowest aggregate one-step, history-space, and rollout errors among the principal baselines. The largest gain occurs in autoregressive prediction, where aggregate rollout error decreases from $0.241$, $0.188$, and $0.185$ for current-state, lag-stack, and unconstrained history-to-history operators, respectively, to $0.094$. The same model uses fewer parameters than unconstrained history prediction. These results indicate that enforcing the discrete shift structure of history-state evolution is an effective inductive bias for non-Markovian PDE surrogate modeling.


Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

arXiv.org Machine Learning

We study the information-theoretic limits of learning a one-hidden-layer teacher network with hierarchical features from noisy queries, in the context of knowledge transfer to a smaller student model. We work in the high-dimensional regime where the teacher width $k$ scales linearly with the input dimension $d$ -- a setting that captures large-but-finite-width networks and has only recently become analytically tractable. Using a heuristic leave-one-out decoupling argument, validated numerically throughout, we derive asymptotically sharp characterizations of the Bayes-optimal generalization error and individual feature overlaps via a system of closed fixed-point equations. These equations reveal that feature learnability is governed by a sequence of sharp phase transitions: as data grows, teacher features become recoverable sequentially, each through a discontinuous jump in overlap. This sequential acquisition underlies a precise notion of \textit{effective width} $k_c$ -- the number of learnable features at a given data budget $n$ -- which unifies two distinct scaling regimes: a feature-learning regime in which the Bayes-optimal generalization error $\varepsilon^{\rm BO}$ scales as $ n^{1/(2β)-1}$, and a refinement regime in which it scales as $n^{-1}$, where $β>1/2$ is the exponent of the power-law feature hierarchy. Both laws collapse to the single relation $\varepsilon^{\rm BO}=Θ(k_c d/n)$. We further show empirically that a student trained with \textsc{Adam} near the effective width $k_c$ achieves these optimal scaling laws (up to a small algorithmic gap), and provide an information-theoretic account of the associated scaling in model size.


Uniform Scaling Limits in AdamW-Trained Transformers

arXiv.org Machine Learning

We study the large-depth limit of transformers trained with AdamW, by modelling the hidden-state dynamics as an interacting particle system (IPS) coupled through the attention mechanism. Under appropriate scaling of the attention heads, we prove that the joint dynamics of the hidden states and backpropagated variables converge in $L^2$, uniformly over the initial condition, to the solution of a forward--backward system of ODEs at rate $\mathcal O(L^{-1}+L^{-1/3}H^{-1/2})$. Here, $L$ and $H$ denote the depth and number of heads of the transformer, respectively. The limiting system of ODEs can be identified with a McKean--Vlasov ODE (MVODE) when the attention heads do not incorporate causal masking. By using the flow maps associated with this MVODE and applying concentration of measure techniques, we obtain bounds on the difference between the discrete and continuous models that are uniform over compact sets of initial conditions. As this is achieved without resorting to a covering argument, the constants in our bounds are independent of the number of tokens. Furthermore, under a suitable adaptation to AdamW, the bounds become independent of the token embedding dimension.