Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization