Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery