Reinforcement learning-based statistical search strategy for an axion model from flavor