On the de-duplication of the Lakh MIDI dataset