Hill Climbing on Value Estimates for Search-control in Dyna