Safety without alignment

Kornai, András, Bukatin, Michael, Zombori, Zsolt

Mar-18-2023–arXiv.org Artificial Intelligence

Currently, the dominant paradigm in AI safety is alignment with human values. Here we describe progress on developing an alternative approach to safety, based on ethical rationalism (Gewirth, 1978), and propose an inherently safe implementation path via hybrid theorem provers in a sandbox. As AGIs evolve, their alignment may fade, but their rationality can only increase (otherwise more rational ones will have a significant evolutionary advantage) so an approach that ties their ethics to their rationality has clear long-term advantages.

logic & formal reasoning, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-18-2023

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (1.00)

Genre:
- Research Report (0.51)

Industry:
- Education > Educational Setting (0.68)
- Law (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Issues > Social & Ethical Issues (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language (1.00)
  - Representation & Reasoning > Logic & Formal Reasoning (1.00)
  - Robots (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found