Personal Universes: A Solution to the Multi-Agent Value Alignment Problem

Yampolskiy, Roman V.

arXiv.org Artificial Intelligence 

Since the birth of the field of Artificial Intelligence (AI) researchers worked on creating ever capable machines, but with recent success in multiple subdomains of AI [1-7] safety and security of such systems and predicted future superintelligences [8, 9] has become paramount [10, 11]. While many diverse safety mechanisms are being investigated [12, 13], the ultimate goal is to align AI with goals, values and preferences of its users which is likely to include all of humanity. Value alignment problem [14], can be decomposed into three sub-problems, namely: personal value extraction from individual persons, combination of such personal preferences in a way, which is acceptable to all, and finally production of an intelligent system, which implements combined values of humanity. A number of approaches for extracting values [15-17] from people have been investigated, including inverse reinforcement learning [18, 19], brain scanning [20], value learning from literature [21], and understanding of human cognitive limitations [22]. Assessment of potential for success for particular techniques of value extraction is beyond the scope of this paper and we simply assume that one of the current methods, their combination, or some future approach will allow us to accurately learn values of given people.