AITopics | learning human-like representation

Learning Human-like Representations to Enable Learning Human Values

Neural Information Processing SystemsDec-24-2025, 22:13:27 GMT

How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We explore the effects of representational alignment between humans and AI agents on learning human values. Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance. We demonstrate that this kind of representational alignment can also support safely learning and exploring human values in the context of personalization. We begin with a theoretical prediction, show that it applies to learning human morality judgments, then show that our results generalize to ten different aspects of human values -- including ethics, honesty, and fairness -- training AI agents on each set of values in a multi-armed bandit setting, where rewards reflect human value judgments over the chosen action. Using a set of textual action descriptions, we collect value judgments from humans, as well as similarity judgments from both humans and multiple language models, and demonstrate that representational alignment enables both safe exploration and improved generalization when learning human values.

artificial intelligence, learning human-like representation, machine learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.83)

Add feedback

Learning Human-like Representations to Enable Learning Human Values

Neural Information Processing SystemsMay-26-2025, 21:07:00 GMT

How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We explore the effects of representational alignment between humans and AI agents on learning human values. Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance. We demonstrate that this kind of representational alignment can also support safely learning and exploring human values in the context of personalization. We begin with a theoretical prediction, show that it applies to learning human morality judgments, then show that our results generalize to ten different aspects of human values -- including ethics, honesty, and fairness -- training AI agents on each set of values in a multi-armed bandit setting, where rewards reflect human value judgments over the chosen action.

artificial intelligence, human value, machine learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Human-like Representations to Enable Learning Human Values

Wynn, Andrea, Sucholutsky, Ilia, Griffiths, Thomas L.

arXiv.org Artificial IntelligenceDec-21-2023

How can we build AI systems that are aligned with human values and objectives in order to avoid causing harm or violating societal standards for acceptable behavior? Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance, among others. We propose that this kind of representational alignment between machine learning (ML) models and humans is also a necessary condition for value alignment, where ML systems conform to human values and societal norms. We focus on ethics as one aspect of value alignment and train multiple ML agents (support vector regression and kernel regression) in a multi-armed bandit setting, where rewards are sampled from a distribution that reflects the morality of the chosen action. We then study the relationship between each agent's degree of representational alignment with humans and their performance when learning to take the most ethical actions.

agent, alignment, representational alignment, (13 more...)

arXiv.org Artificial Intelligence

2312.14106

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

Filters

Collaborating Authors

learning human-like representation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Learning Human-like Representations to Enable Learning Human Values

Learning Human-like Representations to Enable Learning Human Values

Learning Human-like Representations to Enable Learning Human Values