Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control