POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation