LICO: Explainable Models with Language-Image COnsistency