Goto

Collaborating Authors

 morality


The Machine Ethics podcast: Fostering morality with Dr Oliver Bridge

AIHub

Hosted by Ben Byford, The Machine Ethics Podcast brings together interviews with academics, authors, business leaders, designers and engineers on the subject of autonomous algorithms, artificial intelligence, machine learning, and technology's impact on society. Oliver Bridge is an interdisciplinary researcher and educator specialising in morality studies. During his PhD he focused on the intersection of the philosophy and psychology of education and morality. Since then his research interests have evolved to include Machine Ethics, where he aims to apply lessons learnt from the sociological and psychological studies of morality in the context of AI. He is also interested in Systems Theory as a framework for understanding morality and moral development in psychological, social, and artificial systems.


Morality in AI. A plea to embed morality in LLM architectures and frameworks

Bombaerts, Gunter, Delisse, Bram, Kaymak, Uzay

arXiv.org Artificial Intelligence

Large language models (LLMs) increasingly mediate human decision-making and behaviour. Ensuring LLM processing of moral meaning therefore has become a critical challenge. Current approaches rely predominantly on bottom-up methods such as fine-tuning and reinforcement learning from human feedback. We propose a fundamentally different approach: embedding moral meaning processing directly into the architectural mechanisms and frameworks of transformer-based models through top-down design principles. We first sketch a framework that conceptualizes attention as a dynamic interface mediating between structure and processing, contrasting with existing linear attention frameworks in psychology. We start from established biological-artificial attention analogies in neural architecture design to improve cognitive processing. We extend this analysis to moral processing, using Iris Murdoch's theory of loving attention (sustained, just observation that enables moral transformation by reseeing others with clarity and compassion) to philosophically discuss functional analogies between human and LLM moral processing. We formulate and evaluate potentially promising technical operationalizations to embed morality in LLM architectures and frameworks. We acknowledge the limitations of our exploration and give three key contributions. (1) We conceptualize attention as a dynamic system mechanism mediating between structure and processing. (2) Drawing on the Murdoch notion of loving attention, we outline technical pathways for embedding morality in LLMs, through modified training objectives, runtime weight adjustments, and architectural refinements to attention. (3) We argue that integrating morality into architectures and frameworks complements external, constraint-based methods. We conclude with a call for collaboration between transformer designers and philosophers engaged in AI ethics.


The Guilty Pleasure of the Heist

The New Yorker

Elaborate robberies are a Hollywood staple, and the real-life theft at the Louvre has become a phenomenon. Why are we riveted by this particular type of crime? On October 19th, a group of masked men broke into the Louvre in broad daylight and made off with some of France's crown jewels. Suspects are now in custody, but the online fervor is still going strong. On this episode of Critics at Large, Vinson Cunningham, Naomi Fry, and Alexandra Schwartz discuss the sordid satisfaction of watching a heist play out, both onscreen and off.


On the Convergence of Moral Self-Correction in Large Language Models

Liu, Guangliang, Mao, Haitao, Cao, Bochuan, Xue, Zhiyu, Zhang, Xitong, Wang, Rongrong, Johnson, Kristen Marie

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only a general and abstract goal without specific details about potential issues in the response, LLMs must rely on their internal knowledge to improve response quality, a process referred to as intrinsic self-correction. The empirical success of intrinsic self-correction is evident in various applications, but how and why it is effective remains unknown. Focusing on moral self-correction in LLMs, we reveal a key characteristic of intrinsic self-correction: performance convergence through multi-round interactions; and provide a mechanistic analysis of this convergence behavior. Based on our experimental results and analysis, we uncover the underlying mechanism of convergence: consistently injected self-correction instructions activate moral concepts that reduce model uncertainty, leading to converged performance as the activated moral concepts stabilize over successive rounds. This paper demonstrates the strong potential of moral self-correction by showing that it exhibits a desirable property of converged performance.



An LLM-based Agent Simulation Approach to Study Moral Evolution

Ziheng, Zhou, Tang, Huacong, Bi, Mingjie, Kang, Yipeng, He, Wanying, Sun, Fang, Sun, Yizhou, Wu, Ying Nian, Terzopoulos, Demetri, Zhong, Fangwei

arXiv.org Artificial Intelligence

The evolution of morality presents a puzzle: natural selection should favor self-interest, yet humans developed moral systems promoting altruism. We address this question by introducing a novel Large Language Model (LLM)-based agent simulation framework modeling prehistoric hunter-gatherer societies. This platform is designed to probe diverse questions in social evolution, from survival advantages to inter-group dynamics. To investigate moral evolution, we designed agents with varying moral dispositions based on the Expanding Circle Theory \citep{singer1981expanding}. We evaluated their evolutionary success across a series of simulations and analyzed their decision-making in specially designed moral dilemmas. These experiments reveal how an agent's moral framework, in combination with its cognitive constraints, directly shapes its behavior and determines its evolutionary outcome. Crucially, the emergent patterns echo seminal theories from related domains of social science, providing external validation for the simulations. This work establishes LLM-based simulation as a powerful new paradigm to complement traditional research in evolutionary biology and anthropology, opening new avenues for investigating the complexities of moral and social evolution.


DAVID MARCUS: Forgive me, but I was wrong about school prayer

FOX News

Fox News contributor Jonathan Morris and Pastor Robert Jeffress react to the president unveiling new guidance on public school prayer. The battle over prayer in school is raging in Texas right now, with Attorney General Ken Paxton vowing to defend any school district that introduces the controversial practice under a recent state law expanding religious expression in education. For the entirety of my life, and I'm old, the prohibition on public school-sponsored prayer seemed like settled Constitutional science, owing to a 1962 Supreme Court decision barring what had previously been a widespread and normal practice. In the past, I agreed with this form of separation of church and state. For me it was almost a question of better safe than sorry regarding the rights of minority religions, and importantly, I believed that Christian moral values were so ingrained in our culture that 30 seconds a day of praying could be forsaken.


JETHICS: Japanese Ethics Understanding Evaluation Dataset

Takeshita, Masashi, Rzepka, Rafal

arXiv.org Artificial Intelligence

In this work, we propose JETHICS, a Japanese dataset for evaluating ethics understanding of AI models. JETHICS contains 78K examples and is built by following the construction methods of the existing English ETHICS dataset. It includes four categories based normative theories and concepts from ethics and political philosophy; and one representing commonsense morality. Our evaluation experiments on non-proprietary large language models (LLMs) and on GPT-4o reveal that even GPT-4o achieves only an average score of about 0.7, while the best-performing Japanese LLM attains around 0.5, indicating a relatively large room for improvement in current LLMs.


Visual moral inference and communication

Zhu, Warren, Ramezani, Aida, Xu, Yang

arXiv.org Artificial Intelligence

Humans can make moral inferences from multiple sources of input. In contrast, automated moral inference in artificial intelligence typically relies on language models with textual input. However, morality is conveyed through modalities beyond language. We present a computational framework that supports moral inference from natural images, demonstrated in two related tasks: 1) inferring human moral judgment toward visual images and 2) analyzing patterns in moral content communicated via images from public news. We find that models based on text alone cannot capture the fine-grained human moral judgment toward visual stimuli, but language-vision fusion models offer better precision in visual moral inference. Furthermore, applications of our framework to news data reveal implicit biases in news categories and geopolitical discussions. Our work creates avenues for automating visual moral inference and discovering patterns of visual moral communication in public media.


That is Unacceptable: the Moral Foundations of Canceling

Lo, Soda Marem, Araque, Oscar, Sharma, Rajesh, Stranisci, Marco Antonio

arXiv.org Artificial Intelligence

Canceling is a morally-driven phenomenon that hinders the development of safe social media platforms and contributes to ideological polarization. To address this issue we present the Canceling Attitudes Detection (CADE) dataset, an annotated corpus of canceling incidents aimed at exploring the factors of disagreements in evaluating people canceling attitudes on social media. Specifically, we study the impact of annotators' morality in their perception of canceling, showing that morality is an independent axis for the explanation of disagreement on this phenomenon. Annotator's judgments heavily depend on the type of controversial events and involved celebrities. This shows the need to develop more event-centric datasets to better understand how harms are perpetrated in social media and to develop more aware technologies for their detection.