Memory Determines Learning Direction: A Theory of Gradient-Based Optimization in State Space Models