Chess as a Testing Grounds for the Oracle Approach to AI Safety

Miller, James D., Yampolskiy, Roman, Haggstrom, Olle, Armstrong, Stuart

arXiv.org Artificial Intelligence 

September 29, 2020 James D. Miller, Roman Yampolskiy, Olle Häggström, Stuart Armstrong Abstract To reduce the danger of powerful super-intelligent AIs, we might make the first such AIs oracles that can only send and receive messages. This paper proposes a possibly practical means of using machine learning to create two classes of narrow AI oracles that would provide chess advice: those aligned with the player's interest, and those that want the player to lose and give deceptively bad advice. The player would be uncertain which type of oracle it was interacting with. As the oracles would be vastly more intelligent than the player in the domain of chess, experience with these oracles might help us prepare for future artificial general intelligence oracles. Introduction A few years before the term artificial intelligence (AI) was coined, Turing (1951) suggested that once a sufficiently capable AI has been created, we can "expect the machines to take control". This ominous prediction was almost entirely ignored by the research community for half a century, and only in the last couple of decades have academics begun to address the issue of what happens when we build a socalled artificial general intelligence (AGI), i.e., a machine that has human or superhuman level intelligence across the full range of relevant cognitive skills. An increasing number of scientists and scholars have pointed out the crucial importance of making sure that the AGI's goal or utility function is sufficiently aligned with ours, and doing that before the machine takes control; see, e.g., Yudkowsky (2008), Bostrom (2014) and Russell (2019) for influential accounts of this problem, which today goes by the name AI Alignment. Unfortunately, the standard trial-and-error approach to software development under which we write code with the intention of doing some of the debugging after development would go disastrously wrong if an AGI took control before we determined how to align the AGI's utility function with our values. An alternative - or more likely a complement - to AI Alignment that has sometimes been suggested is to initially limit the AGI's ability to interact with the environment before we have verified that the AGI is aligned.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found