Non-MarkovianRewardModellingfromTrajectory LabelsviaInterpretableMultipleInstanceLearning