Reinforcement Learning based Sequential Batch-sampling for Bayesian Optimal Experimental Design