Failure Tolerant Training with Persistent Memory Disaggregation over CXL