Reinforcement Learning Closures for Underresolved Partial Differential Equations using Synthetic Data