Goto

Collaborating Authors

 convcnp


Characterizing the Representational Capacity of Neural Processes

arXiv.org Machine Learning

What functions can Neural Processes represent? We analyze the representational capacity of popular NP architectures: Conditional Neural Processes (CNPs), Attentive Neural Processes (ANPs), Transformer Neural Processes (TNPs), and their latent variants. We prove these architectures form a strict hierarchy. CNP-representable functions are exactly those depending on finitely many expected features of the context distribution. ANPs strictly generalize CNPs via query-dependent reweighting, enabling kernel smoothers. ConvCNPs and ANPs are incomparable; each contains functions outside the other, separated by stationarity versus translation equivariance. TNPs with $L$ self-attention layers capture $L$-hop context interactions. For latent NPs, we show finite-dimensional latents provide coherent sampling but do not circumvent encoder limitations; matching GP posterior distributions requires latent dimension scaling with context size. These results provide a theoretical foundation for architecture selection based on task structure.



SupplementaryMaterial: Meta-LearningStationaryStochasticProcess PredictionwithConvolutionalNeuralProcesses

Neural Information Processing Systems

In this setting, we implementφ, by selecting the context points, and prepend the context mask: φ = [Mc,Zc]>. C.2 Pseudo-CodefortheConvNP The ConvNP can be implemented very simply by passing samples from the ConvCNP through an additional CNN decoder, which we denotedθ.


Meta-LearningStationaryStochasticProcess PredictionwithConvolutionalNeuralProcesses

Neural Information Processing Systems

Prediction in such models can be viewed as atranslation equivariant map from observed data sets to predictiveSPs, emphasizing the intimate relationship between stationarity andequivariance.


Where to Measure: Epistemic Uncertainty-Based Sensor Placement with ConvCNPs

arXiv.org Artificial Intelligence

Accurate sensor placement is critical for modeling spatio-temporal systems such as environmental and climate processes. Neural Processes (NPs), particularly Convolutional Conditional Neural Processes (ConvCNPs), provide scalable probabilistic models with uncertainty estimates, making them well-suited for data-driven sensor placement. However, existing approaches rely on total predictive uncertainty, which conflates epistemic and aleatoric components, that may lead to suboptimal sensor selection in ambiguous regions. To address this, we propose expected reduction in epistemic uncertainty as a new acquisition function for sensor placement. To enable this, we extend ConvCNPs with a Mixture Density Networks (MDNs) output head for epistemic uncertainty estimation. Preliminary results suggest that epistemic uncertainty driven sensor placement more effectively reduces model error than approaches based on overall uncertainty.





Review for NeurIPS paper: Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Neural Information Processing Systems

The authors say that they use as an encoder a convCNP. Looking at the psudo-code in algorithm 1 in the appendix, it is unclear to me if the convCNP is actually run all the way and given some discretize grid as targets, or are the discretization at the level of t_i used? I would assume the latter but this is not stated in the text. If it's the former I don't understand why line 6 and 7 (in algorithm 1) are needed in the encoder. Same goes for the pseudo-code in the appendix.


Gridded Transformer Neural Processes for Large Unstructured Spatio-Temporal Data

arXiv.org Machine Learning

Many important problems require modelling large-scale spatio-temporal datasets, with one prevalent example being weather forecasting. Recently, transformer-based approaches have shown great promise in a range of weather forecasting problems. However, these have mostly focused on gridded data sources, neglecting the wealth of unstructured, off-the-grid data from observational measurements such as those at weather stations. A promising family of models suitable for such tasks are neural processes (NPs), notably the family of transformer neural processes (TNPs). Although TNPs have shown promise on small spatio-temporal datasets, they are unable to scale to the quantities of data used by state-of-the-art weather and climate models. This limitation stems from their lack of efficient attention mechanisms. We address this shortcoming through the introduction of gridded pseudo-token TNPs which employ specialised encoders and decoders to handle unstructured observations and utilise a processor containing gridded pseudo-tokens that leverage efficient attention mechanisms. Our method consistently outperforms a range of strong baselines on various synthetic and real-world regression tasks involving large-scale data, while maintaining competitive computational efficiency. The real-life experiments are performed on weather data, demonstrating the potential of our approach to bring performance and computational benefits when applied at scale in a weather modelling pipeline.