Modeling speech emotion with label variance and analyzing performance across speakers and unseen acoustic conditions