Referring Transformer: A One-step Approach to Multi-task Visual Grounding