swg
Diffusion Self-Weighted Guidance for Offline Reinforcement Learning
Tagle, Augusto, Ruiz-del-Solar, Javier, Tobar, Felipe
Offline reinforcement learning (RL) recovers the optimal policy $π$ given historical observations of an agent. In practice, $π$ is modeled as a weighted version of the agent's behavior policy $μ$, using a weight function $w$ working as a critic of the agent's behavior. Though recent approaches to offline RL based on diffusion models have exhibited promising results, the computation of the required scores is challenging due to their dependence on the unknown $w$. In this work, we alleviate this issue by constructing a diffusion over both the actions and the weights. With the proposed setting, the required scores are directly obtained from the diffusion model without learning extra networks. Our main conceptual contribution is a novel guidance method, where guidance (which is a function of $w$) comes from the same diffusion model, therefore, our proposal is termed Self-Weighted Guidance (SWG). We show that SWG generates samples from the desired distribution on toy examples and performs on par with state-of-the-art methods on D4RL's challenging environments, while maintaining a streamlined training pipeline. We further validate SWG through ablation studies on weight formulations and scalability.
The Unreasonable Effectiveness of Guidance for Diffusion Models
Kaiser, Tim, Adaloglou, Nikolas, Kollmann, Markus
Guidance is an error-correcting technique used to improve the perceptual quality of images generated by diffusion models. Typically, the correction is achieved by linear extrapolation, using an auxiliary diffusion model that has lower performance than the primary model. Using a 2D toy example, we show that it is highly beneficial when the auxiliary model exhibits similar errors as the primary one but stronger. We verify this finding in higher dimensions, where we show that competitive generative performance to state-of-the-art guidance methods can be achieved when the auxiliary model differs from the primary one only by having stronger weight regularization. As an independent contribution, we investigate whether upweighting long-range spatial dependencies improves visual fidelity. The result is a novel guidance method, which we call sliding window guidance (SWG), that guides the primary model with itself by constraining its receptive field. Intriguingly, SWG aligns better with human preferences than state-of-the-art guidance methods while requiring neither training, architectural modifications, nor class conditioning. The code will be released.
Extreme heatwave sampling and prediction with analog Markov chain and comparisons with deep learning
Miloshevich, George, Lucente, Dario, Yiou, Pascal, Bouchet, Freddy
We present a data-driven emulator, stochastic weather generator (SWG), suitable for estimating probabilities of prolonged heatwaves in France and Scandinavia. This emulator is based on the method of analogs of circulation to which we add temperature and soil moisture as predictor fields. We train the emulator on an intermediate complexity climate model run and show that it is capable of predicting conditional probabilities (forecasting) of heatwaves out of sample. Special attention is payed that this prediction is evaluated using proper score appropriate for rare events. To accelerate the computation of analogs dimensionality reduction techniques are applied and the performance is evaluated. The probabilistic prediction achieved with SWG is compared with the one achieved with Convolutional Neural Network (CNN). With the availability of hundreds of years of training data CNNs perform better at the task of probabilistic prediction. In addition, we show that the SWG emulator trained on 80 years of data is capable of estimating extreme return times of order of thousands of years for heatwaves longer than several days more precisely than the fit based on generalised extreme value distribution. Finally, the quality of its synthetic extreme teleconnection patterns obtained with stochastic weather generator is studied. We showcase two examples of such synthetic teleconnection patterns for heatwaves in France and Scandinavia that compare favorably to the very long climate model control run.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)