make exception
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
AI systems are becoming increasingly intertwined with human life. In order to effectively collaborate with humans and ensure safety, AI systems need to be able to understand, interpret and predict human moral judgments and decisions. Human moral judgments are often guided by rules, but not always. A central challenge for AI safety is capturing the flexibility of the human moral mind -- the ability to determine when a rule should be broken, especially in novel or unusual situations. In this paper, we present a novel challenge set consisting of moral exception question answering (MoralExceptQA) of cases that involve potentially permissible moral exceptions - inspired by recent moral psychology studies. Using a state-of-the-art large language model (LLM) as a basis, we propose a novel moral chain of thought (MoralCoT) prompting strategy that combines the strengths of LLMs with theories of moral reasoning developed in cognitive science to predict human moral judgments. MoralCoT outperforms seven existing LLMs by 6.2% F1, suggesting that modeling human reasoning might be necessary to capture the flexibility of the human moral mind. We also conduct a detailed error analysis to suggest directions for future work to improve AI safety using MoralExceptQA.
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
AI systems are becoming increasingly intertwined with human life. In order to effectively collaborate with humans and ensure safety, AI systems need to be able to understand, interpret and predict human moral judgments and decisions. Human moral judgments are often guided by rules, but not always. A central challenge for AI safety is capturing the flexibility of the human moral mind -- the ability to determine when a rule should be broken, especially in novel or unusual situations. In this paper, we present a novel challenge set consisting of moral exception question answering (MoralExceptQA) of cases that involve potentially permissible moral exceptions – inspired by recent moral psychology studies.
Can You Program Ethics Into a Self-Driving Car?
A drunken man walking along a sidewalk at night trips and falls directly in front of a driverless car, which strikes him square on, killing him instantly. Had a human been at the wheel, the death would have been considered an accident because the pedestrian was clearly at fault and no reasonable person could have swerved in time. But the "reasonable person" legal standard for driver negligence disappeared back in the 2020s, when the proliferation of driverless cars reduced crash rates by 90 percent. Now the standard is that of the reasonable robot. The victim's family sues the vehicle manufacturer on that ground, claiming that, although the car didn't have time to brake, it could have swerved around the pedestrian, crossing the double yellow line and colliding with the empty driverless vehicle in the next lane.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- North America > United States > District of Columbia (0.04)
- North America > United States > California (0.04)
- (4 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- (2 more...)