Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement