Interpreting ResNet-based CLIP via Neuron-Attention Decomposition