Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent