Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining