Non-MarkovianRewardModellingfromTrajectory LabelsviaInterpretableMultipleInstanceLearning

Neural Information Processing Systems 

There is growing consensus around the view that aligned and beneficial AI requires a reframing of objectives as being contingent, uncertain, and learnable via interaction with humans [35].