Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding