RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations