Improving Automatic Parallel Training via Balanced Memory Workload Optimization