SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin