AITopics | preference reversal

Reinforcement Learning with Non-Exponential Discounting

Neural Information Processing SystemsApr-24-2026, 20:15:47 GMT

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

178b306c7ee66a66db2171646e17da36-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 16:15:17 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Classifying Inconsistency in AHP Pairwise Comparison Matrices Using Machine Learning

Bose, Amarnath

arXiv.org Machine LearningMay-13-2025

Assessing consistency in Pairwise Comparison Matrices (PCMs) within the Analytical Hierarchy Process (AHP) poses significant challenges when using the traditional Consistency Ratio (CR) method. This study introduces a novel alternative that leverages triadic preference reversals (PR) to provide a more robust and interpretable assessment of consistency. Triadic preference reversals capture inconsistencies between a pair of elements by comparing the direction of preference derived from the global eigenvector with that from a 3x3 submatrix (triad) containing the same pair, highlighting local-global preference conflicts. This method detects a reversal when one eigen ratio exceeds one while another falls below one, signaling inconsistency. We identify two key features: the proportion of preference reversals and the maximum reversal, which mediate the impact of a PCM's order on its consistency. Using these features simulated PCMs are clustered into consistent and inconsistent classes through k-means clustering, followed by training a logistic classifier for consistency evaluation. The PR method achieves 97\% accuracy, significantly surpassing the Consistency Ratio (CR) method's 50%, with a false negative rate of only 2.6\% compared to 5.5\%. These findings demonstrate the PR method's superior accuracy in assessing AHP consistency, thereby enabling more reliable decision-making. The proposed triadic preference reversal (PR) approach is implemented in the R package AHPtools publicly available on the Comprehensive R Archive Network (CRAN).

artificial intelligence, machine learning, pcm, (17 more...)

arXiv.org Machine Learning

2505.06293

Country: Asia > India (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Pacos: Modeling Users' Interpretable and Context-Dependent Choices in Preference Reversals

Li, Qingming, Zhao, H. Vicky

arXiv.org Artificial IntelligenceJun-17-2023

Choice problems refer to the problem of selecting the best choices from several available items, and learning users' preferences in choice problems is of great importance in understanding users' decision making mechanisms and providing personalized services. Existing works typically assume that people evaluate items independently. In practice, however, users' preferences depend on the market in which items are placed, which is known as the context effects; and the order of users' preferences for two items may even be reversed, which is called to preference reversals. In this work, we identify three factors contributing to the context effects: users' adaptive weights, the inter-item comparison, and display positions. We propose a context-dependent preference model named Pacos as a unified framework to address three factors simultaneously, and consider two design methods including an additive method with high interpretability and an ANN-based method with high accuracy. We study the conditions for preference reversals to occur and provide a theoretical proof of the effectiveness of Pacos in predicting when preference reversals would occur. Experimental results show that the proposed method has better performance than prior works in predicting users' choices, and has great interpretability to help understand the cause of preference reversals. Choice problems, such as purchasing a festival gift or picking a restaurant, involve comparing several available items. Previous works on preference modeling and analysis typically assume that people evaluate items independently, and the relative preference between two items is fixed regardless of other competing options [1]. However, numerous studies show that the above independence assumption is frequently violated in reality [2], [3]. It is essential to model how the relative preference is influenced by competing options and figure out how people select their best choices. This study can help understand users' decision making mechanisms and offer personalized services, and provide important guidelines on pricing strategies and sales forecasts. To show this independence violation, we conduct a real user test. In our test, we set two markets of Xiaomi scale, as shown in Figure 1 (a) and (b). In these two markets, we consider sellers described by two attributes: price (¥) and seller reputation (REP).

artificial intelligence, machine learning, preference reversal, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.knosys.2023.110835

2303.05648

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reinforcement Learning with Non-Exponential Discounting

Schultheis, Matthias, Rothkopf, Constantin A., Koeppl, Heinz

arXiv.org Artificial IntelligenceDec-7-2022

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2209.13413

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

How can we make sure that algorithms are fair?

#artificialintelligenceDec-17-2019, 17:55:48 GMT

Using machines to augment human activity is nothing new. Egyptian hieroglyphs show the use of horse-drawn carriages even before 300 B.C. Ancient Indian literature such as "Silapadikaram" has described animals being used for farming. And one glance outside shows that today people use motorized vehicles to get around. Where in the past human beings have augmented ourselves in physical ways, now the nature of augmentation also is more intelligent. Again, all one needs to do is look to cars – engineers are seemingly on the cusp of self-driving cars guided by artificial intelligence.

algorithm, intelligence, machine intelligence, (15 more...)

#artificialintelligence

Country: North America > United States > California (0.05)

Industry: