Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders