AITopics | value space

Collaborating Authors

value space

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretation as Linear Transformation: A Cognitive-Geometric Model of Belief and Meaning

Amornbunchornvej, Chainarong

arXiv.org Artificial IntelligenceDec-11-2025

This paper develops a geometric framework for modeling belief, motivation, and influence across cognitively heterogeneous agents. Each agent is represented by a personalized value space, a vector space encoding the internal dimensions through which the agent interprets and evaluates meaning. Beliefs are formalized as structured vectors-abstract beings-whose transmission is mediated by linear interpretation maps. A belief survives communication only if it avoids the null spaces of these maps, yielding a structural criterion for intelligibility, miscommunication, and belief death. Within this framework, I show how belief distortion, motivational drift, counterfactual evaluation, and the limits of mutual understanding arise from purely algebraic constraints. A central result-"the No-Null-Space Leadership Condition"-characterizes leadership as a property of representational reachability rather than persuasion or authority. More broadly, the model explains how abstract beings can propagate, mutate, or disappear as they traverse diverse cognitive geometries. The account unifies insights from conceptual spaces, social epistemology, and AI value alignment by grounding meaning preservation in structural compatibility rather than shared information or rationality. I argue that this cognitive-geometric perspective clarifies the epistemic boundaries of influence in both human and artificial systems, and offers a general foundation for analyzing belief dynamics across heterogeneous agents.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.09831

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

10 Appendix 10.1 Pseudo-code for DQN Pro Below, we present the pseudo-code for DQN Pro. Notice that the difference between DQN and DQN

Neural Information Processing SystemsNov-15-2025, 06:23:22 GMT

Below, we present the pseudo-code for DQN Pro. Pro is minimal (highlighted in gray). Sticky actions True Optimizer Adam Kingma & Ba (2015) Network architecture Nature DQN network Mnih et al. (2015) Random seeds { 0, 1, 2, 3, 4 } Rainbow hyper-parameters (shared) Batch size 64 Other Config file rainbow_aaai.gin Theorem 2. Consider the PMPI algorithm specified by: We make two assumptions: 1. we assume error in policy evaluation step, as already stated in equation (4). All results are averaged over 5 independent seeds.

dqn, proximal update, rainbow, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

7dfa77fcef807c9a078b58fd619ad897-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 09:18:32 GMT

artificial intelligence, machine learning, rainbow, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Language Models Encode Numbers Using Digit Representations in Base 10

Levy, Amit Arnold, Geva, Mor

arXiv.org Artificial IntelligenceOct-15-2024

Large language models (LLMs) frequently make errors when handling even simple numerical problems, such as comparing two small numbers. A natural hypothesis is that these errors stem from how LLMs represent numbers, and specifically, whether their representations of numbers capture their numeric values. We tackle this question from the observation that LLM errors on numerical tasks are often distributed across \textit{the digits} of the answer rather than normally around \textit{its numeric value}. Through a series of probing experiments and causal interventions, we show that LLMs internally represent numbers with individual circular representations per-digit in base 10. This digit-wise representation, as opposed to a value representation, sheds light on the error patterns of models on tasks involving numerical reasoning and could serve as a basis for future studies on analyzing numerical mechanisms in LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.11781

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

Bertsekas, Dimitri P.

arXiv.org Artificial IntelligenceJun-30-2024

In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method.

approximation, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2406.00592

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games (1.00)
Energy > Oil & Gas > Downstream (1.00)
Energy > Oil & Gas > Upstream (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values

Yao, Jing, Yi, Xiaoyuan, Wang, Xiting, Gong, Yifan, Xie, Xing

arXiv.org Artificial IntelligenceNov-15-2023

The rapid advancement of Large Language Models (LLMs) has attracted much attention to value alignment for their responsible development. However, how to define values in this context remains a largely unexplored question. Existing work mainly follows the Helpful, Honest, Harmless principle and specifies values as risk criteria formulated in the AI community, e.g., fairness and privacy protection, suffering from poor clarity, adaptability and transparency. Inspired by basic values in humanity and social science across cultures, this work proposes a novel basic value alignment paradigm and introduces a value space spanned by basic value dimensions. All LLMs' behaviors can be mapped into the space by identifying the underlying values, possessing the potential to address the three challenges. To foster future research, we apply the representative Schwartz's Theory of Basic Values as an initialized example and construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs. Our extensive analysis of FULCRA reveals the underlying relation between basic values and LLMs' behaviors, demonstrating that our approach not only covers existing mainstream risks but also anticipates possibly unidentified ones. Additionally, we present an initial implementation of the basic value evaluation and alignment, paving the way for future research in this line.

alignment, arxiv preprint arxiv, value alignment, (15 more...)

arXiv.org Artificial Intelligence

2311.10766

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Arithmetic with Language Models: from Memorization to Computation

Maltoni, Davide, Ferrara, Matteo

arXiv.org Artificial IntelligenceAug-2-2023

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypotheses that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

large language model, machine learning, multiplication, (20 more...)

arXiv.org Artificial Intelligence

2308.01154

Country: Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.41)

Add feedback

Universal Regression with Adversarial Responses

Blanchard, Moïse, Jaillet, Patrick

arXiv.org Artificial IntelligenceJun-9-2023

We provide algorithms for regression with adversarial responses under large classes of non-i.i.d. instance sequences, on general separable metric spaces, with provably minimal assumptions. We also give characterizations of learnability in this regression context. We consider universal consistency which asks for strong consistency of a learner without restrictions on the value responses. Our analysis shows that such an objective is achievable for a significantly larger class of instance sequences than stationary processes, and unveils a fundamental dichotomy between value spaces: whether finite-horizon mean estimation is achievable or not. We further provide optimistically universal learning rules, i.e., such that if they fail to achieve universal consistency, any other algorithms will fail as well. For unbounded losses, we propose a mild integrability condition under which there exist algorithms for adversarial regression under large classes of non-i.i.d. instance sequences. In addition, our analysis also provides a learning rule for mean estimation in general metric spaces that is consistent under adversarial responses without any moment conditions on the sequence, a result of independent interest.

artificial intelligence, machine learning, value space, (16 more...)

arXiv.org Artificial Intelligence

2203.05067

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition

Liu, Yu Tang, Ahmad, Aamir

arXiv.org Artificial IntelligenceMar-24-2023

Deep reinforcement learning (DRL) frameworks are increasingly used to solve high-dimensional continuous-control tasks in robotics. However, due to the lack of sample efficiency, applying DRL for online learning is still practically infeasible in the robotics domain. One reason is that DRL agents do not leverage the solution of previous tasks for new tasks. Recent work on multi-tasking DRL agents based on successor features has proven to be quite promising in increasing sample efficiency. In this work, we present a new approach that unifies two prior multi-task RL frameworks, SF-GPI and value composition, for the continuous control domain. We exploit compositional properties of successor features to compose a policy distribution from a set of primitives without training any new policy. Lastly, to demonstrate the multi-tasking mechanism, we present a new benchmark for multi-task continuous control environment based on Raisim. This also facilitates large-scale parallelization to accelerate the experiments. Our experimental results in the Pointmass environment show that our multi-task agent has single task performance on par with soft actor critic (SAC) and the agent can successfully transfer to new unseen tasks where SAC fails. We provide our code as open-source at https://github.com/robot-perception-group/concurrent_composition for the benefit of the community.

composition, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2303.13935

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Geometry of Robust Value Functions

Wang, Kaixin, Kumar, Navdeep, Zhou, Kuangqi, Hooi, Bryan, Feng, Jiashi, Mannor, Shie

arXiv.org Artificial IntelligenceAug-11-2022

The space of value functions is a fundamental concept in reinforcement learning. Characterizing its geometric properties may provide insights for optimization and representation. Existing works mainly focus on the value space for Markov Decision Processes (MDPs). In this paper, we study the geometry of the robust value space for the more general Robust MDPs (RMDPs) setting, where transition uncertainties are considered. Specifically, since we find it hard to directly adapt prior approaches to RMDPs, we start with revisiting the non-robust case, and introduce a new perspective that enables us to characterize both the non-robust and robust value space in a similar fashion. The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces. Through our analysis, we show that the robust value space is determined by a set of conic hypersurfaces, each of which contains the robust values of all policies that agree on one state. Furthermore, we find that taking only extreme points in the uncertainty set is sufficient to determine the robust value space. Finally, we discuss some other aspects about the robust value space, including its non-convexity and policy agreement on multiple states.

robust value space, s-rectangular uncertainty, value space, (15 more...)

arXiv.org Artificial Intelligence

2201.12929

Country:

Asia > Singapore > Central Region > Singapore (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback