Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features