Towards Safe and Honest AI Agents with Neural Self-Other Overlap

Open in new window