unplugged
Efficient Offline Policy Optimization with a Learned Model
Liu, Zichen, Li, Siyi, Lee, Wee Sun, Yan, Shuicheng, Xu, Zhongwen
MuZero Unplugged presents a promising approach for offline policy learning from logged data. It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data. For good performance, MCTS requires accurate learned models and a large number of simulations, thus costing huge computing time. This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline RL settings, including 1) learning with limited data coverage; 2) learning from offline data of stochastic environments; 3) improperly parameterized models given the offline data; 4) with a low compute budget. We propose to use a regularized one-step look-ahead approach to tackle the above issues. Instead of planning with the expensive MCTS, we use the learned model to construct an advantage estimation based on a one-step rollout. Policy improvements are towards the direction that maximizes the estimated advantage with regularization of the dataset. We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. Experimental results show that our proposed approach achieves stable performance even with an inaccurate learned model. On the large-scale Atari benchmark, the proposed method outperforms MuZero Unplugged by 43%. Most significantly, it uses only 5.6% wall-clock time (i.e., 1 hour) compared to MuZero Unplugged (i.e., 17.8 hours) to achieve a 150% IQM normalized score with the same hardware and software stacks. Our implementation is open-sourced at https://github.com/sail-sg/rosmo.
CS Unplugged or Coding Classes?
Computer science unplugged (CS Unplugged, or just "Unplugged") is a pedagogy for teaching computational ideas to grade-school students without using a computer.a It was developed in the early 1990s as a necessity when working with computers in the classroom was not usually practical, but it still finds widespread adoption as a supplement to computer-based lessons, even where devices are readily available. This appears as a contradiction to some (if you are teaching computer science, why not spend as much time as possible on a computer?), Unfortunately, Unplugged can also be used to justify poor decisions by treating it as a complete curriculum in itself--a teacher who does not have the time or support to extend themselves in new curriculum content might rely on Unplugged as "enough," or administrators might justify a lack of funding by suggesting that schools use Unplugged teaching instead of buying devices. The Unplugged approach is widely used, mentioned in dozens of research papers about CS education, has been translated into many languages, and is widely used in teacher professional development.1
Microsoft's AI Chatbot Becomes Racist, Has To Be Unplugged
Microsoft introduced a chatbot yesterday called Tay. The company was running an experiment in conversational understanding, meaning that the more people interacted with the artificial intelligence-powered chatbot the smarter it would become. I don't know about smarter, but it didn't take more than 24 hours for Tay to become a full blown racist on Twitter. That's what the internet will do to you. When it first arrived on the scene, Tay was an innocent Twitter chatbot that you and I could interact with to see just how far along artificial intelligence has come. It didn't take long for things to get ugly though as people soon started tweeting racist and misogynistic things at Tay and it picked it all up.