Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction