AITopics

2502.07532

Country:

Europe (0.46)
North America > United States (0.34)

Genre: Research Report (0.41)

Industry:

Energy > Renewable (0.69)
Government (0.66)
Food & Agriculture > Agriculture (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningFeb-10-2025

Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo

Kelvinius, Filip Ekström, Zhao, Zheng, Lindsten, Fredrik

Previous methods for posterior sampling A recent line of research has exploited pre-trained generative with diffusion priors, while providing impressive diffusion models as priors for solving Bayesian results on tasks like image reconstruction (Kawar et al., inverse problems. We contribute to this research direction 2022; Chung et al., 2023; Song et al., 2023), often rely by designing a sequential Monte Carlo method on approximations and fail or perform poorly on simple for linear-Gaussian inverse problems which builds on tasks (Cardoso et al., 2024, and our Section 5.1), "decoupled diffusion", where the generative process is making it uncertain to what extent they can solve designed such that larger updates to the sample are Bayesian inference problems in general.

artificial intelligence, ddsmc, machine learning, (18 more...)

2502.06379

Country: Europe (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceFeb-10-2025

WyckoffDiff - A Generative Diffusion Model for Crystal Symmetry

Kelvinius, Filip Ekström, Andersson, Oskar B., Parackal, Abhijith S., Qian, Dong, Armiento, Rickard, Lindsten, Fredrik

Crystalline materials often exhibit a high level of symmetry. However, most generative models do not account for symmetry, but rather model each atom without any constraints on its position or element. We propose a generative model, Wyckoff Diffusion (WyckoffDiff), which generates symmetry-based descriptions of crystals. This is enabled by considering a crystal structure representation that encodes all symmetry, and we design a novel neural network architecture which enables using this representation inside a discrete generative model framework. In addition to respecting symmetry by construction, the discrete nature of our model enables fast generation. We additionally present a new metric, Fr\'echet Wrenformer Distance, which captures the symmetry aspects of the materials generated, and we benchmark WyckoffDiff against recently proposed generative models for crystal generation.

machine learning, natural language, protostructure, (20 more...)

2502.06485

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-17-2024

On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods

Govindarajan, Hariprasath, Sidén, Per, Roll, Jacob, Lindsten, Fredrik

A prominent self-supervised learning paradigm is to model the representations as clusters, or more generally as a mixture model. Learning to map the data samples to compact representations and fitting the mixture model simultaneously leads to the representation collapse problem. Regularizing the distribution of data points over the clusters is the prevalent strategy to avoid this issue. While this is sufficient to prevent full representation collapse, we show that a partial prototype collapse problem still exists in the DINO family of methods, that leads to significant redundancies in the prototypes. Such prototype redundancies serve as shortcuts for the method to achieve a marginal latent class distribution that matches the prescribed prior. We show that by encouraging the model to use diverse prototypes, the partial prototype collapse can be mitigated. Effective utilization of the prototypes enables the methods to learn more fine-grained clusters, encouraging more informative representations. We demonstrate that this is especially beneficial when pre-training on a long-tailed fine-grained dataset.

artificial intelligence, machine learning, prototype, (14 more...)

2410.1406

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-7-2024

Continuous Ensemble Weather Forecasting with Diffusion models

Andrae, Martin, Landelius, Tomas, Oskarsson, Joel, Lindsten, Fredrik

Weather forecasting has seen a shift in methods from numerical simulations to data-driven systems. While initial research in the area focused on deterministic forecasting, recent works have used diffusion models to produce skillful ensemble forecasts. These models are trained on a single forecasting step and rolled out autoregressively. However, they are computationally expensive and accumulate errors for high temporal resolution due to the many rollout steps. We address these limitations with Continuous Ensemble Forecasting, a novel and flexible method for sampling ensemble forecasts in diffusion models. The method can generate temporally consistent ensemble trajectories completely in parallel, with no autoregressive steps. Continuous Ensemble Forecasting can also be combined with autoregressive rollouts to yield forecasts at an arbitrary fine temporal resolution without sacrificing accuracy. We demonstrate that the method achieves competitive results for global weather forecasting with good probabilistic properties. Forecasting of physical systems over both space and time is a crucial problem with plenty of realworld applications, including in the earth sciences, transportation, and energy systems. A prime example of this is weather forecasting, which billions of people depend on daily to plan their activities. Weather forecasting is also crucial for making informed decisions in areas such as agriculture, renewable energy production, and safeguarding communities against extreme weather events.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

2410.05431

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJul-13-2024

Towards understanding epoch-wise double descent in two-layer linear neural networks

Olmin, Amanda, Lindsten, Fredrik

Epoch-wise double descent is the phenomenon where generalisation performance improves beyond the point of overfitting, resulting in a generalisation curve exhibiting two descents under the course of learning. Understanding the mechanisms driving this behaviour is crucial not only for understanding the generalisation behaviour of machine learning models in general, but also for employing conventional selection methods, such as the use of early stopping to mitigate overfitting. While we ultimately want to draw conclusions of more complex models, such as deep neural networks, a majority of theoretical conclusions regarding the underlying cause of epoch-wise double descent are based on simple models, such as standard linear regression. To start bridging this gap, we study epoch-wise double descent in two-layer linear neural networks. First, we derive a gradient flow for the linear two-layer model, that bridges the learning dynamics of the standard linear regression model, and the linear two-layer diagonal network with quadratic weights. Second, we identify additional factors of epoch-wise double descent emerging with the extra model layer, by deriving necessary conditions for the generalisation error to follow a double descent pattern. While epoch-wise double descent in linear regression has been attributed to differences in input variance, in the two-layer model, also the singular values of the input-output covariance matrix play an important role. This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models.

artificial intelligence, double descent, machine learning, (19 more...)

2407.09845

Country:

North America > United States (0.14)
Europe > Sweden (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

arXiv.org Machine LearningJun-7-2024

Probabilistic Weather Forecasting with Hierarchical Graph Neural Networks

Oskarsson, Joel, Landelius, Tomas, Deisenroth, Marc Peter, Lindsten, Fredrik

In recent years, machine learning has established itself as a powerful tool for high-resolution weather forecasting. While most current machine learning models focus on deterministic forecasts, accurately capturing the uncertainty in the chaotic weather system calls for probabilistic modeling. We propose a probabilistic weather forecasting model called Graph-EFM, combining a flexible latent-variable formulation with the successful graph-based forecasting framework. The use of a hierarchical graph construction allows for efficient sampling of spatially coherent forecasts. Requiring only a single forward pass per time step, Graph-EFM allows for fast generation of arbitrarily large ensembles. We experiment with the model on both global and limited area forecasting. Ensemble forecasts from Graph-EFM achieve equivalent or lower errors than comparable deterministic models, with the added benefit of accurately capturing forecast uncertainty.

artificial intelligence, lead time, machine learning, (19 more...)

2406.04759

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry: Energy > Renewable (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

arXiv.org Artificial IntelligenceMay-17-2024

DINO as a von Mises-Fisher mixture model

Govindarajan, Hariprasath, Sidén, Per, Roll, Jacob, Lindsten, Fredrik

Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between K-dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Using this insight we propose DINOvMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture model is beneficial in terms of better image representations. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. We obtain similar improvements for iBOT-vMF vs iBOT and thereby show the relevance of our proposed modification also for other methods derived from DINO. Self-supervised learning (SSL) is an effective approach for pre-training models on large unlabeled datasets. The main objective of SSL pre-training is to learn representations that are transferable to a range of, so called, downstream tasks. Early SSL methods achieved this through handcrafted pretext tasks that act as inductive biases in the representation learning process (Komodakis & Gidaris, 2018; Noroozi & Favaro, 2016; Kim et al., 2018; Doersch et al., 2015; Larsson et al., 2016; Zhang et al., 2016).

artificial intelligence, deep learning, machine learning, (19 more...)

2405.10939

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningFeb-26-2024

On the connection between Noise-Contrastive Estimation and Contrastive Divergence

Olmin, Amanda, Lindqvist, Jakob, Svensson, Lennart, Lindsten, Fredrik

Noise-contrastive estimation (NCE) is a popular method for estimating unnormalised probabilistic models, such as energy-based models, which are effective for modelling complex data distributions. Unlike classical maximum likelihood (ML) estimation that relies on importance sampling (resulting in ML-IS) or MCMC (resulting in contrastive divergence, CD), NCE uses a proxy criterion to avoid the need for evaluating an often intractable normalisation constant. Despite apparent conceptual differences, we show that two NCE criteria, ranking NCE (RNCE) and conditional NCE (CNCE), can be viewed as ML estimation methods. Specifically, RNCE is equivalent to ML estimation combined with conditional importance sampling, and both RNCE and CNCE are special cases of CD. These findings bridge the gap between the two method classes and allow us to apply techniques from the ML-IS and CD literature to NCE, offering several advantageous extensions.

artificial intelligence, cnce, machine learning, (19 more...)

2402.16688

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Machine LearningNov-14-2023

Graph-based Neural Weather Prediction for Limited Area Modeling

Oskarsson, Joel, Landelius, Tomas, Lindsten, Fredrik

The rise of accurate machine learning methods for weather forecasting is creating radical new possibilities for modeling the atmosphere. In the time of climate change, having access to high-resolution forecasts from models like these is also becoming increasingly vital. While most existing Neural Weather Prediction (NeurWP) methods focus on global forecasting, an important question is how these techniques can be applied to limited area modeling. In this work we adapt the graph-based NeurWP approach to the limited area setting and propose a multi-scale hierarchical model extension. Our approach is validated by experiments with a local model for the Nordic region.

artificial intelligence, machine learning, modeling & simulation, (13 more...)

2309.1737

Country:

Europe > Sweden (0.14)
Europe > Switzerland (0.14)

Genre: Research Report (0.64)

Industry: Energy > Renewable (0.69)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)