A Reward Net Algorithm

Neural Information Processing Systems 

In this section, we present the detailed procedures of MRN in Algorithm 1. In Section 4.2, the implicit derivative at iteration k of is calculated by: g Cauchy-Schwarz inequality, and the last inequality holds for the definition of Lipschitz smoothness. Lemma 2. Assume the outer loss Then the gradient of with respect to the outer loss is Lipschitz continuous. Theorem 1. Assume the outer loss Theorem 2. Assume the outer loss Even worse, it might be difficult for human experts to give preferences to trajectory pairs (e.g., a pair of poor trajectories.). This problem leads to a significant impact on the efficiency of the feedback in the initial stage.