Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning