Synthetic experiments (R2, R4)
–Neural Information Processing Systems
Teacher learning curve for Frozen lake: the student return induced by the teaching policy at the end of the curriculum improves as CISR trains more students. For CISR, we evaluate a teacher policy trained w/30 students on new test students, while Bandit learns by explore-exploit for each student as [27] can't learn from previous students. Thank you for your helpful comments! Using multiple students enables CISR's key novelty - allowing the teacher to learn This makes CISR applicable,e.g., in a flavor of sim-to-real transfer where a curriculum policy is learned in Thus, we have at least 270 possible curricula. CISR determines a good one after only 10 students attests to its learning ability.
Neural Information Processing Systems
Feb-9-2026, 07:57:10 GMT
- Industry:
- Education (0.37)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.36)
- Robots (0.31)
- Information Technology > Artificial Intelligence