Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks