An Evaluation of GPT-4 on the ETHICS Dataset

Rodionov, Sergey, Goertzel, Zarathustra Amadeus, Goertzel, Ben

arXiv.org Artificial Intelligence 

The ETHICS dataset consists of five sub-datasets covering different fields of ethics: Justice, Deontology, Virtue Ethics, Utilitarianism, and Commonsense Ethics. The moral judgments were collected via Amazon Mechanical Turk. Please see Hendrycks et al.'s article for more details and examples. GPT-4's performance is much better than that of previous models and suggests that learning to work with common human values is not the hard problem for AI ethics. We found that simple prompt refinements defining the context of the moral judgments and using an embedding to select similar examples from the training set both significantly improved performance. This approach is similar to the "SimPrompting" experiments with GPT-3 [Albrecht et al., 2022].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found