Enhancing variational quantum state diagonalization using reinforcement learning techniques