teaching problem
Reviews: Machine Teaching of Active Sequential Learners
This paper considers the problem of teaching an active sequential *machine learner* (e.g., an active learning algorithm), with a teacher which can "fake" the labels/outcomes of training examples with the goal of steering the learner faster to the goal state. The authors refer to such teacher a "planning teacher", as opposed to a "naive teacher" which is often considered in the classical machine teaching problems. This setting differs from conventional machine teaching settings, in that In classical machine teaching setting, the teacher can only choose among a given set of training examples that are consistent with the target concept, and is often not allowed to provide inconsistent examples. The majority of the existing work in machine teaching considers teaching a "passive learner", with a few exceptions (see additional reference in the comments below). The assumption that the teacher can choose the data-generation distribution makes it a very powerful teacher with a much richer action set than conventional teaching.
Optimally Teaching a Linear Behavior Cloning Agent
Bharti, Shubham Kumar, Wright, Stephen, Singla, Adish, Zhu, Xiaojin
We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this setup, the teacher can select which states to demonstrate to an LBC learner. The learner maintains a version space of infinite linear hypotheses consistent with the demonstration. The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations. This number is known as the Teaching Dimension(TD). We present a teaching algorithm called ``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD. However, we also show that finding optimal teaching set computationally is NP-hard. We further provide an approximation algorithm that guarantees an approximation ratio of $\log(|A|-1)$ on the teaching dimension. Finally, we provide experimental results to validate the efficiency and effectiveness of our algorithm.