Benchmarking and Enhancing Disentanglement in Concept-Residual Models