Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search