Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization