Generally Intelligent #12: Jacob Steinhardt, UC Berkeley, on machine learning safety, alignment and measurement

#artificialintelligence 

Jacob Steinhardt (Google Scholar) (Website) is an assistant professor at UC Berkeley. His main research interest is in designing machine learning systems that are reliable and aligned with human values. Some of his specific research directions include robustness, rewards specification and reward hacking, as well as scalable alignment. His most recent paper at ICLR 2021 proposes a new test to measure an NLP model's accuracy on a wide variety of tasks, ranging from mathematics, US history, law, and more. It provides a measurement tool to help researchers specify an important problem: while current models can achieve superhuman performance on benchmarks, they lack the ability to understand language on a whole. Another of Jacob's papers at ICLR focuses on measuring a language model's knowledge of basic concepts of morality. It shows that current language models have a promising but incomplete ability to predict basic human ethical judgements. "Test accuracy is a very limited metric." "You might not be able to get lots of feedback on human values." Below are the show notes and full transcript. As always, please feel free to reach out with feedback, ideas, and questions! I think it required me to learn to become a significantly better writer. And I think that helped later on, because it made me feel more comfortable pursuing unusual ideas. I knew I had the skills to present those ideas. As long as I believed in them, I could get other people to believe in them." You just want this very diverse distribution of things that are deeply ingrained in evolutionary history as opposed to being part of explicit reasoning" First of all, test accuracy is a very limited metric. What are we trying to do with it? For a while, there was a lot of climate skepticism or climate denial. At some point it becomes pretty clear, when there's regular heat waves fires and that sort of thing. You probably wanted to do something about it before that point. Having these more subtle measurements that you can look at are important. And the other thing is I think it actually laid the groundwork for the more extreme weather events to become a convincing signal. Jacob Steinhardt: Another thing that I'm interested in is just measuring the progress in capabilities, getting different AI capabilities seems important. Vision tasks just seem to be falling like flies. I don't know if there's any vision tasks that's survived for more than a year and a few tasks seem a little bit better, but I think those are also starting to fall like flies. I know we've come up with a few harder tasks. ML Systems are still not very good at math. Humans also aren't very good at math, but also not good at law it turns out.