Three Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding