Position tracking of a varying number of sound sources with sliding permutation invariant training