Investigating model performance in language identification: beyond simple error statistics