Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models