A Implementation Details

Neural Information Processing Systems 

The details of hyperparameter are described in Table 9. We conduct the ASR evaluation and ASV evaluation to compare the above methods. Following (Choi et al., 2021), we average each representation from Similar to the previous analysis of XLSR-53 (Choi et al., 2021), the representations from the 1st layer of XLS-R are already clustered by each speaker while it is hard to distinguish the representations of Table 11 shows that the adaptation quality is improved with an increase in the number of samples. Phoneme predictor We conduct the ablation study of phoneme predictor. Following (Kim et al., 2021), we remove a bias parameter of phoneme predictor, which causes unstable training during mixed precision training.