Goto

Collaborating Authors

 model estimate


Geometry Aware Operator Transformer As An Efficient And Accurate Neural Surrogate For PDEs On Arbitrary Domains

Neural Information Processing Systems

The very challenging task of learning solution operators of PDEs on arbitrary domains accurately and efficiently is of vital importance to engineering and industrial simulations. Despite the existence of many operator learning algorithms to approximate such PDEs, we find that accurate models are not necessarily computationally efficient and vice versa. We address this issue by proposing a geometry aware operator transformer (GAOT) for learning PDEs on arbitrary domains. GAOT combines novel multiscale attentional graph neural operator encoders and decoders, together with geometry embeddings and (vision) transformer processors to accurately map information about the domain and the inputs into a robust approximation of the PDE solution. Multiple innovations in the implementation of GAOT also ensure computational efficiency and scalability. We demonstrate this significant gain in both accuracy and efficiency of GAOT over several baselines on a large number of learning tasks from a diverse set of PDEs, including achieving state of the art performance on three large scale three-dimensional industrial CFD datasets. Our project page for accessing the source code is available at camlab-ethz.github.io/GAOT.




Global Convergence of Federated Learning for Mixed Regression

Neural Information Processing Systems

This paper studies the problem of model training under Federated Learning when clients exhibit cluster structure. We contextualize this problem in mixed regression, where each client has limited local data generated from one of $k$ unknown regression models. We design an algorithm that achieves global convergence from any initialization, and works even when local data volume is highly unbalanced -- there could exist clients that contain $O(1)$ data points only. Our algorithm first runs moment descent on a few anchor clients (each with $\tilde{\Omega}(k)$ data points) to obtain coarse model estimates. Then each client alternately estimates its cluster labels and refines the model estimates based on FedAvg or FedProx. A key innovation in our analysis is a uniform estimate on the clustering errors, which we prove by bounding the VC dimension of general polynomial concept classes based on the theory of algebraic geometry.






LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning

arXiv.org Artificial Intelligence

Do large language models (LLMs) construct and manipulate internal world models, or do they rely solely on statistical associations represented as output layer token probabilities? We adapt cognitive science methodologies from human mental models research to test LLMs on pulley system problems using TikZ-rendered stimuli. Study 1 examines whether LLMs can estimate mechanical advantage (MA). State-of-the-art models performed marginally but significantly above chance, and their estimates correlated significantly with ground-truth MA. Significant correlations between number of pulleys and model estimates suggest that models employed a pulley counting heuristic, without necessarily simulating pulley systems to derive precise values. Study 2 tested this by probing whether LLMs represent global features crucial to MA estimation. Models evaluated a functionally connected pulley system against a fake system with randomly placed components. Without explicit cues, models identified the functional system as having greater MA with F1=0.8, suggesting LLMs could represent systems well enough to differentiate jumbled from functional systems. Study 3 built on this by asking LLMs to compare functional systems with matched systems which were connected up but which transferred no force to the weight; LLMs identified the functional system with F1=0.46, suggesting random guessing. Insofar as they may generalize, these findings are compatible with the notion that LLMs manipulate internal world models, sufficient to exploit statistical associations between pulley count and MA (Study 1), and to approximately represent system components' spatial relations (Study 2). However, they may lack the facility to reason over nuanced structural connectivity (Study 3). We conclude by advocating the utility of cognitive scientific methods to evaluate the world-modeling capacities of artificial intelligence systems.


Exploring the usage of Probabilistic Neural Networks for Ionospheric electron density estimation

arXiv.org Artificial Intelligence

A fundamental limitation of traditional Neural Networks (NN) in predictive modelling is their inability to quantify uncertainty in their outputs. In critical applications like positioning systems, understanding the reliability of predictions is paramount for constructing confidence intervals, early warning systems, and effectively propagating results. For instance, Precise Point Positioning (PPP, see Zumberge et al (1997)) in satellite navigation heavily relies on accurate error models for ancillary data (orbits, clocks, ionosphere, and troposphere) to compute precise error estimates and establish robust protection levels. As an example, one of the main objectives of the Galileo High Accuracy Service (HAS) Service Level 2 will be to provide the necessary regional atmospheric delay corrections (and associated uncertainty) in order to improve user positioning based on PPP strategies, most notably the convergence time of the solution (see for instance Juan et al (2025)). To address this challenge, the main objectives of this paper aims at exploring a potential framework capable of providing both point estimates and associated uncertainty measures of ionospheric Vertical Total Electron Content (VTEC). Probabilistic Neural Networks (PNNs) offer a promising approach to achieve this goal. However, constructing an effective PNN requires meticulous design of hidden and output layers, as well as careful definition of prior and posterior probability distributions for network weights and biases. This introduction provides a review in terms of state-of-the-art in PNN as well as the application of NN in ionospheric estimation of VTEC.