Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints