Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning