Variational Sequential Optimal Experimental Design using Reinforcement Learning