What Does It Mean to Align AI With Human Values?

#artificialintelligence 

Many years ago, I learned to program on an old Symbolics Lisp Machine. The operating system had a built-in command spelled "DWIM," short for "Do What I Mean." If I typed a command and got an error, I could type "DWIM," and the machine would try to figure out what I meant to do. A surprising fraction of the time, it actually worked. The DWIM command was a microcosm of the more modern problem of "AI alignment": We humans are prone to giving machines ambiguous or mistaken instructions, and we want them to do what we mean, not necessarily what we say.