Education
RevisitingSmoothedOnlineLearning
In this paper, we revisit the problem of smoothed online learning, in which the online learner suffersboth ahitting costandaswitching cost, andtargettwoperformance metrics: competitiveratio anddynamic regretwith switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the simple idea of balancing the two costs by an optimizationproblem.
Synthetic experiments (R2, R4)
Teacher learning curve for Frozen lake: the student return induced by the teaching policy at the end of the curriculum improves as CISR trains more students. For CISR, we evaluate a teacher policy trained w/30 students on new test students, while Bandit learns by explore-exploit for each student as [27] can't learn from previous students. Thank you for your helpful comments! Using multiple students enables CISR's key novelty - allowing the teacher to learn This makes CISR applicable,e.g., in a flavor of sim-to-real transfer where a curriculum policy is learned in Thus, we have at least 270 possible curricula. CISR determines a good one after only 10 students attests to its learning ability.