Towards General Error Diagnosis via Behavioral Testing in Machine Translation