"The years of 15- to 20-year relief pitchers is going away. It's becoming younger and younger, come in and throw as hard as you can because if you can't throw 95-plus you're not going to make it. There's a lot of people out there teaching just max velocity, they're not teaching how to pitch. I can't tell you how many high school kids they're like, 'Hey, can you catch a bullpen?' And they'll come out and they're throwing as hard as they can, and I'm diving for the ball.
In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. We focus on consensus super teaching. It aims at organizing distributed teachers to jointly select a compact while informative training subset from data hosted by the teachers to make a learner learn better. The challenges arise from three perspectives. First, the state-of-the-art pool-based super teaching method applies mixed-integer non-linear programming (MINLP) which does not scale well to very large data sets. Second, it is desirable to restrict data access of the teachers to only their own data during the collaboration stage to mitigate privacy leaks. Finally, the teaching collaboration should be communication-efficient since large communication overheads can cause synchronization delays between teachers. To address these challenges, we formulate collaborative teaching as a consensus and privacy-preserving optimization process to minimize teaching risk. We theoretically demonstrate the necessity of collaboration between teachers for improving the learner's learning. Furthermore, we show that the proposed method enjoys a similar property as the Oracle property of adaptive Lasso. The empirical study illustrates that our teaching method can deliver significantly more accurate teaching results with high speed, while the non-collaborative MINLP-based super teaching becomes prohibitively expensive to compute.
With the increasing demand for large amount of labeled data, crowdsourcing has been used in many large-scale data mining applications. However, most existing works in crowdsourcing mainly focus on label inference and incentive design. In this paper, we address a different problem of adaptive crowd teaching, which is a sub-area of machine teaching in the context of crowdsourcing. Compared with machines, human beings are extremely good at learning a specific target concept (e.g., classifying the images into given categories) and they can also easily transfer the learned concepts into similar learning tasks. Therefore, a more effective way of utilizing crowdsourcing is by supervising the crowd to label in the form of teaching. In order to perform the teaching and expertise estimation simultaneously, we propose an adaptive teaching framework named JEDI to construct the personalized optimal teaching set for the crowdsourcing workers. In JEDI teaching, the teacher assumes that each learner has an exponentially decayed memory. Furthermore, it ensures comprehensiveness in the learning process by carefully balancing teaching diversity and learner's accurate learning in terms of teaching usefulness. Finally, we validate the effectiveness and efficacy of JEDI teaching in comparison with the state-of-the-art techniques on multiple data sets with both synthetic learners and real crowdsourcing workers.