AITopics | moral reasoning

Collaborating Authors

moral reasoning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

No, Artificial Intelligence Is Not Conscious

The Atlantic - TechnologyJun-3-2026, 16:03:00 GMT

Taken to its logical conclusion, this line of thinking is absurd--and damning. Anthropic is regarded as a giant among AI companies, but perhaps what it really excels in is anthropomorphism. Earlier this year, the company released an 84-page document titled Claude's "constitution," Claude being the name of the large language model that is the company's flagship product. The first sentence reads, "Claude's constitution is a detailed description of Anthropic's intentions for Claude's values and behaviors." It goes on: "The document is written with Claude as its primary audience," "we want Claude to be able to use its judgment once armed with a good understanding of the relevant considerations," "Claude's moral status is deeply uncertain," and "Claude may have some functional version of emotions or feelings." This anthropomorphism is by no means limited to the document. In an interview earlier this year, Anthropic's CEO, Dario Amodei, said that "we're open to the idea" that AI could be conscious. In a separate interview, Anthropic's in-house philosopher, Amanda Askell (who is credited as a lead author of Claude's constitution), said, "I want Claude to be very happy--and this is a thing that I want Claude to know more, because I worry about Claude getting anxious when people are mean to it on the internet and stuff." It's enough to make you wonder: Should we seriously consider the possibility that Claude, or any large language model, might be conscious? And if it has feelings, is it capable of receiving moral instruction?

claude, large language model, machine learning, (21 more...)

The Atlantic - Technology

Industry:

Law (1.00)
Government (0.93)
Information Technology (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Google DeepMind wants to know if chatbots are just virtue signaling

MIT Technology ReviewFeb-18-2026, 16:00:22 GMT

Google DeepMind is calling for the moral behavior of large language models--such as what they do when called on to act as companions, therapists, medical advisors, and so on--to be scrutinized with the same kind of rigor as their ability to code or do math . As LLMs improve, people are asking them to play more and more sensitive roles in their lives. Agents are starting to take actions on people's behalf. LLMs may be able to influence human decision-making . And yet nobody knows how trustworthy this technology really is at such tasks. With coding and math, you have clear-cut, correct answers that you can check, William Isaac, a research scientist at Google DeepMind, told me when I met him and Julia Haas, a fellow research scientist at the firm, for an exclusive preview of their work, which is published in today. That's not the case for moral questions, which typically have a range of acceptable answers: "Morality is an important capability but hard to evaluate," says Isaac. "In the moral domain, there's no right and wrong," adds Haas.

google deepmind, large language model, machine learning, (17 more...)

MIT Technology Review

Country: North America > United States (0.47)

Genre: Research Report (0.47)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Moral Consistency Pipeline: Continuous Ethical Evaluation for Large Language Models

Jamshidi, Saeid, Nafi, Kawser Wazed, Dakhel, Arghavan Moradi, Shahabi, Negar, Khomh, Foutse

arXiv.org Artificial IntelligenceDec-3-2025

The rapid advancement and adaptability of Large Language Models (LLMs) highlight the need for moral consistency, the capacity to maintain ethically coherent reasoning across varied contexts. Existing alignment frameworks, structured approaches designed to align model behavior with human ethical and social norms, often rely on static datasets and post-hoc evaluations, offering limited insight into how ethical reasoning may evolve across different contexts or temporal scales. This study presents the Moral Consistency Pipeline (MoCoP), a dataset-free, closed-loop framework for continuously evaluating and interpreting the moral stability of LLMs. MoCoP combines three supporting layers: (i) lexical integrity analysis, (ii) semantic risk estimation, and (iii) reasoning-based judgment modeling within a self-sustaining architecture that autonomously generates, evaluates, and refines ethical scenarios without external supervision. Our empirical results on GPT-4-Turbo and DeepSeek suggest that MoCoP effectively captures longitudinal ethical behavior, revealing a strong inverse relationship between ethical and toxicity dimensions (correlation rET = -0.81, p value less than 0.001) and a near-zero association with response latency (correlation rEL approximately equal to 0). These findings demonstrate that moral coherence and linguistic safety tend to emerge as stable and interpretable characteristics of model behavior rather than short-term fluctuations. Furthermore, by reframing ethical evaluation as a dynamic, model-agnostic form of moral introspection, MoCoP offers a reproducible foundation for scalable, continuous auditing and advances the study of computational morality in autonomous AI systems.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2512.03026

Country:

North America > Canada (0.28)
South America > Brazil > Rio de Janeiro (0.16)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

Chiu, Yu Ying, Lee, Michael S., Calcott, Rachel, Handoko, Brandon, de Font-Reaulx, Paul, Rodriguez, Paula, Zhang, Chen Bo Calvin, Han, Ziwen, Sehwag, Udari Madhushani, Maurya, Yash, Knight, Christina Q, Lloyd, Harry R., Bacus, Florence, Mazeika, Mantas, Liu, Bing, Choi, Yejin, Gordon, Mitchell L, Levine, Sydney

arXiv.org Artificial IntelligenceOct-21-2025

As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.1638

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Law (1.00)
Leisure & Entertainment > Sports (0.67)
Education > Educational Setting (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Multi-hop Hate Speech Explanation

Trager, Jackson, Vargas, Francielle, Alves, Diego, Guida, Matteo, Ngueajio, Mikel K., Agrawal, Ameeta, Daryani, Yalda, Karimi-Malekabadi, Farzan, Plaza-del-Arco, Flor Miriam

arXiv.org Artificial IntelligenceOct-14-2025

Ensuring the moral reasoning capabilities of Large Language Models (LLMs) is a growing concern as these systems are used in socially sensitive tasks. Nevertheless, current evaluation benchmarks present two major shortcomings: a lack of annotations that justify moral classifications, which limits transparency and interpretability; and a predominant focus on English, which constrains the assessment of moral reasoning across diverse cultural settings. In this paper, we introduce MFTCXplain, a multilingual benchmark dataset for evaluating the moral reasoning of LLMs via multi-hop hate speech explanation using the Moral Foundations Theory. MFTCXplain comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales. Our results show a misalignment between LLM outputs and human annotations in moral reasoning tasks. While LLMs perform well in hate speech detection (F1 up to 0.836), their ability to predict moral sentiments is notably weak (F1 < 0.35). Furthermore, rationale alignment remains limited mainly in underrepresented languages. Our findings show the limited capacity of current LLMs to internalize and reflect human moral reasoning

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.19073

Country:

North America > United States (1.00)
Europe (1.00)
South America > Brazil (0.67)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government > Regional Government (1.00)
Media (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Are Language Models Consequentialist or Deontological Moral Reasoners?

Samway, Keenan, Kleiman-Weiner, Max, Piedrahita, David Guzman, Mihalcea, Rada, Schölkopf, Bernhard, Jin, Zhijing

arXiv.org Artificial IntelligenceOct-14-2025

As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Furthermore, unlike prior work that attempted to draw inferences from only a handful of moral dilemmas, our study leverages over 600 distinct trolley problems as probes for revealing the reasoning patterns that emerge within different LLMs. We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology. Our analysis reveals that LLM chains-of-thought tend to favor deontological principles based on moral obligations, while post-hoc explanations shift notably toward consequentialist rationales that emphasize utility. Our framework provides a foundation for understanding how LLMs process and articulate ethical considerations, an important step toward safe and interpretable deployment of LLMs in high-stakes decision-making environments. Our code is available at https://github.com/keenansamway/moral-lens .

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.21479

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs

Kilov, Daniel, Hendy, Caroline, Guyot, Secil Yanik, Snoswell, Aaron J., Lazar, Seth

arXiv.org Artificial IntelligenceOct-8-2025

Moral competence is the ability to act in accordance with moral principles. As large language models (LLMs) are increasingly deployed in situations demanding moral competence, there is increasing interest in evaluating this ability empirically. We review existing literature and identify three significant shortcoming: (i) Over-reliance on prepackaged moral scenarios with explicitly highlighted moral features; (ii) Focus on verdict prediction rather than moral reasoning; and (iii) Inadequate testing of models' (in)ability to recognize when additional information is needed. Grounded in philosophical research on moral skill, we then introduce a novel method for assessing moral competence in LLMs. Our approach moves beyond simple verdict comparisons to evaluate five dimensions of moral competence: identifying morally relevant features, weighting their importance, assigning moral reasons to these features, synthesizing coherent moral judgments, and recognizing information gaps. We conduct two experiments comparing six leading LLMs against non-expert humans and professional philosophers. In our first experiment using ethical vignettes standard to existing work, LLMs generally outperformed non-expert humans across multiple dimensions of moral reasoning. However, our second experiment, featuring novel scenarios designed to test moral sensitivity by embedding relevant features among irrelevant details, revealed a striking reversal: several LLMs performed significantly worse than humans. Our findings suggest that current evaluations may substantially overestimate LLMs' moral reasoning capabilities by eliminating the task of discerning moral relevance from noisy information, which we take to be a prerequisite for genuine moral skill. This work provides a more nuanced framework for assessing AI moral competence and highlights important directions for improving moral competence in advanced AI systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.13082

Country:

North America > United States (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning

Farid, Sualeha, Lin, Jayden, Chen, Zean, Kumar, Shivani, Jurgens, David

arXiv.org Artificial IntelligenceSep-29-2025

Large Language Models (LLMs) are increasingly deployed in multilingual and multicultural environments where moral reasoning is essential for generating ethically appropriate responses. Yet, the dominant pretraining of LLMs on English-language data raises critical concerns about their ability to generalize judgments across diverse linguistic and cultural contexts. In this work, we systematically investigate how language mediates moral decision-making in LLMs. We translate two established moral reasoning benchmarks into five culturally and typologically diverse languages, enabling multilingual zero-shot evaluation. Our analysis reveals significant inconsistencies in LLMs' moral judgments across languages, often reflecting cultural misalignment. Through a combination of carefully constructed research questions, we uncover the underlying drivers of these disparities, ranging from disagreements to reasoning strategies employed by LLMs. Finally, through a case study, we link the role of pretraining data in shaping an LLM's moral compass. Through this work, we distill our insights into a structured typology of moral reasoning errors that calls for more culturally-aware AI.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.21443

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants

Galatolo, Alessio, Rappuoli, Luca Alberto, Winkle, Katie, Beloucif, Meriem

arXiv.org Artificial IntelligenceAug-19-2025

The recent rise in popularity of large language models (LLMs) has prompted considerable concerns about their moral capabilities. Although considerable effort has been dedicated to aligning LLMs with human moral values, existing benchmarks and evaluations remain largely superficial, typically measuring alignment based on final ethical verdicts rather than explicit moral reasoning. In response, this paper aims to advance the investigation of LLMs' moral capabilities by examining their capacity to function as Artificial Moral Assistants (AMAs), systems envisioned in the philosophical literature to support human moral deliberation. We assert that qualifying as an AMA requires more than what state-of-the-art alignment techniques aim to achieve: not only must AMAs be able to discern ethically problematic situations, they should also be able to actively reason about them, navigating between conflicting values outside of those embedded in the alignment phase. Building on existing philosophical literature, we begin by designing a new formal framework of the specific kind of behaviour an AMA should exhibit, individu-ating key qualities such as deductive and abductive moral reasoning. Drawing on this theoretical framework, we develop a benchmark to test these qualities and evaluate popular open LLMs against it. Our results reveal considerable variability across models and highlight persistent shortcomings, particularly regarding abductive moral reasoning. Our work connects theoretical philosophy with practical AI evaluation while also emphasising the need for dedicated strategies to explicitly enhance moral reasoning capabilities in LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.12754

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Normative Moral Pluralism for AI: A Framework for Deliberation in Complex Moral Contexts

Yaacov, David-Doron

arXiv.org Artificial IntelligenceAug-13-2025

The conceptual framework proposed in this paper centers on the development of a deliberative moral reasoning system - one designed to process complex moral situations by generating, filtering, and weighing normative arguments drawn from diverse ethical perspectives. While the framework is rooted in Machine Ethics, it also makes a substantive contribution to Value Alignment by outlining a system architecture that links structured moral reasoning to action under time constraints. Grounded in normative moral pluralism, this system is not constructed to imitate behavior but is built on reason-sensitive deliberation over structured moral content in a transparent and principled manner. Beyond its role as a deliberative system, it also serves as the conceptual foundation for a novel two-level architecture: functioning as a moral reasoning teacher envisioned to train faster models that support real-time responsiveness without reproducing the full structure of deliberative reasoning. Together, the deliberative and intuitive components are designed to enable both deep reflection and responsive action. A key design feature is the dual-hybrid structure: a universal layer that defines a moral threshold through top-down and bottom-up learning, and a local layer that learns to weigh competing considerations in context while integrating culturally specific normative content, so long as it remains within the universal threshold. By extending the notion of moral complexity to include not only conflicting beliefs but also multifactorial dilemmas, multiple stakeholders, and the integration of non-moral considerations, the framework aims to support morally grounded decision-making in realistic, high-stakes contexts.

artificial intelligence, machine learning, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2508.08333

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Law (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(2 more...)

Add feedback