Referring Transformer: A One-step Approach to Multi-task Visual Grounding Muchen Li1,2 Leonid Sigal