DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation