Optimizing Metachronal Paddling with Reinforcement Learning at Low Reynolds Number