Goto

Collaborating Authors

 convcnp



Meta-LearningStationaryStochasticProcess PredictionwithConvolutionalNeuralProcesses

Neural Information Processing Systems

Prediction in such models can be viewed as atranslation equivariant map from observed data sets to predictiveSPs, emphasizing the intimate relationship between stationarity andequivariance.


Where to Measure: Epistemic Uncertainty-Based Sensor Placement with ConvCNPs

Eksen, Feyza, Oehmcke, Stefan, Lüdtke, Stefan

arXiv.org Artificial Intelligence

Accurate sensor placement is critical for modeling spatio-temporal systems such as environmental and climate processes. Neural Processes (NPs), particularly Convolutional Conditional Neural Processes (ConvCNPs), provide scalable probabilistic models with uncertainty estimates, making them well-suited for data-driven sensor placement. However, existing approaches rely on total predictive uncertainty, which conflates epistemic and aleatoric components, that may lead to suboptimal sensor selection in ambiguous regions. To address this, we propose expected reduction in epistemic uncertainty as a new acquisition function for sensor placement. To enable this, we extend ConvCNPs with a Mixture Density Networks (MDNs) output head for epistemic uncertainty estimation. Preliminary results suggest that epistemic uncertainty driven sensor placement more effectively reduces model error than approaches based on overall uncertainty.






Review for NeurIPS paper: Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Neural Information Processing Systems

The authors say that they use as an encoder a convCNP. Looking at the psudo-code in algorithm 1 in the appendix, it is unclear to me if the convCNP is actually run all the way and given some discretize grid as targets, or are the discretization at the level of t_i used? I would assume the latter but this is not stated in the text. If it's the former I don't understand why line 6 and 7 (in algorithm 1) are needed in the encoder. Same goes for the pseudo-code in the appendix.


Gridded Transformer Neural Processes for Large Unstructured Spatio-Temporal Data

Ashman, Matthew, Diaconu, Cristiana, Langezaal, Eric, Weller, Adrian, Turner, Richard E.

arXiv.org Machine Learning

Many important problems require modelling large-scale spatio-temporal datasets, with one prevalent example being weather forecasting. Recently, transformer-based approaches have shown great promise in a range of weather forecasting problems. However, these have mostly focused on gridded data sources, neglecting the wealth of unstructured, off-the-grid data from observational measurements such as those at weather stations. A promising family of models suitable for such tasks are neural processes (NPs), notably the family of transformer neural processes (TNPs). Although TNPs have shown promise on small spatio-temporal datasets, they are unable to scale to the quantities of data used by state-of-the-art weather and climate models. This limitation stems from their lack of efficient attention mechanisms. We address this shortcoming through the introduction of gridded pseudo-token TNPs which employ specialised encoders and decoders to handle unstructured observations and utilise a processor containing gridded pseudo-tokens that leverage efficient attention mechanisms. Our method consistently outperforms a range of strong baselines on various synthetic and real-world regression tasks involving large-scale data, while maintaining competitive computational efficiency. The real-life experiments are performed on weather data, demonstrating the potential of our approach to bring performance and computational benefits when applied at scale in a weather modelling pipeline.


Approximately Equivariant Neural Processes

Ashman, Matthew, Diaconu, Cristiana, Weller, Adrian, Bruinsma, Wessel, Turner, Richard E.

arXiv.org Machine Learning

Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topographical features like mountains break translation equivariance. In these scenarios, it is desirable to construct architectures that can flexibly depart from exact equivariance in a data-driven way. In this paper, we develop a general approach to achieving this using existing equivariant architectures. Our approach is agnostic to both the choice of symmetry group and model architecture, making it widely applicable. We consider the use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models. We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, demonstrating that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.