Solving the QAP by Two-Stage Graph Pointer Networks and Reinforcement Learning