Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding