Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies