DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models