Using multiple reference audios and style embedding constraints for speech synthesis