Gradient Optimization for Single-State RMDPs