AITopics | r-learning

Collaborating Authors

r-learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

R-learninginactor-criticmodeloffersabiologically relevantmechanismforsequentialdecision-making

Neural Information Processing SystemsFeb-10-2026, 17:00:03 GMT

Afewstudies haveexplored sequential stay-or-leavedecisions in humans, or rodents - the model organism used to access neuronal activity at high resolution. In both cases, decision patterns were collected inforaging tasks-the experimental settings where subjects decide when to leave depleting resources (2).

artificial intelligence, initial reward, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing SystemsDec-24-2025, 17:52:23 GMT

In real-world settings, we repeatedly decide whether to pursue better conditions or to keep things unchanged. Examples include time investment, employment, entertainment preferences etc. How do we make such decisions? To address this question, the field of behavioral ecology has developed foraging paradigms - the model settings in which human and non-human subjects decided when to leave depleting food resources. Foraging theory, represented by the marginal value theorem (MVT), provided accurate average-case stay-or-leave rules consistent with behaviors of subjects towards depleting resources. Yet, the algorithms underlying individual choices and ways to learn such algorithms remained unclear.

biologically relevant mechanism, r-learning, sequential decision-making, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.60)

Add feedback

R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing SystemsAug-16-2025, 18:41:13 GMT

In everyday life we repeatedly face sequential stay-or-leave decisions.

agent, initial reward, r-learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

We thank the reviewers for their time and thorough comments, as well as their valuation of our work including its

Neural Information Processing SystemsAug-16-2025, 18:41:01 GMT

For the larger discussion items, please find the detailed comments below. Additionally, the reviewers highlighted the importance of quantitative fits. We currently attempt to differentiate between these models using additional manipulations. R-learning may be advantageous for computation. Our work builds upon results in the field including Ref [2] This observation enabled us to pursue the hypothesis of the leaky estimate of average reward.

artificial intelligence, machine learning, reviewer, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

Review for NeurIPS paper: R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing SystemsFeb-7-2025, 00:46:25 GMT

Weaknesses: More attention should be paid for teasing out differences between V and R learning, with intermittent initial rewards being essentially the only example. Although it is impressive that new VTA recording data is presented in the paper, I don't feel that the result is particularly helpful - it only shows that VTA activity doesn't contradict R-learning model, but it does not really provide specific support for it. It should be possible to design different tasks/protocols under which the two formalisations would have substantially different TD errors, which could help tease out biological correlates of the two models. Furthermore, it would be nice to see more details of parameter estimation and the resulting best-fitting parameter values, which if done properly, may allow to achieve not only a qualitative but also a better quantitative fit between Figure 1E and Figure 1D (as well as between Figure 1D and Figure 1B). As the models have multiple parameters substantially affecting performance, the two models should be compared under best-fitting parameters and should include formal measures like AIC, not just qualitative fits. Of course model universality regardless of parameters is helpful, but quantitative fit is equally important.

actor-critic model offer, biologically relevant mechanism, sequential decision-making, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing SystemsFeb-7-2025, 00:46:17 GMT

This is a well-written and presented paper proposing a new framework for modeling animal behavior during a foraging task, and should be of interest to the NeurIPS audience. After rebuttal, 3 of the reviewers recommended accept based on it providing a nice link between the behavioral economics and reinforcement learning communities, and its strengths in both theory and empirical results. Therefore, I tentatively recommend accept. That said, during the discussions some concerns were brought up regarding some missing related work. I urge the authors to consider discussing in their final version several related works that R4, and I think are quite relevant: Daw et al, 2002, Neural Networks; Schwighofer & Doya 2003, Neural Networks; Niv et al 2006/2007 (and related), and also some works from motivation modeling literature (that R2 mentions in their review).

actor-critic model offer, biologically relevant mechanism, sequential decision-making, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)

Add feedback

R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing SystemsOct-11-2024, 13:04:12 GMT

In real-world settings, we repeatedly decide whether to pursue better conditions or to keep things unchanged. Examples include time investment, employment, entertainment preferences etc. How do we make such decisions? To address this question, the field of behavioral ecology has developed foraging paradigms – the model settings in which human and non-human subjects decided when to leave depleting food resources. Foraging theory, represented by the marginal value theorem (MVT), provided accurate average-case stay-or-leave rules consistent with behaviors of subjects towards depleting resources. Yet, the algorithms underlying individual choices and ways to learn such algorithms remained unclear.

actor-critic model offer, biologically relevant mechanism, sequential decision-making, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

The Connection Between R-Learning and Inverse-Variance Weighting for Estimation of Heterogeneous Treatment Effects

Fisher, Aaron

arXiv.org Machine LearningJul-18-2023

Our motivation is to shed light the performance of the widely popular "R-Learner." Like many other methods for estimating conditional average treatment effects (CATEs), R-Learning can be expressed as a weighted pseudo-outcome regression (POR). Previous comparisons of POR techniques have paid careful attention to the choice of pseudo-outcome transformation. However, we argue that the dominant driver of performance is actually the choice of weights. Specifically, we argue that R-Learning implicitly performs an inverse-variance weighted form of POR. These weights stabilize the regression and allow for convenient simplifications of bias terms.

artificial intelligence, machine learning, regression, (19 more...)

arXiv.org Machine Learning

2307.097

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Deep R-Learning for Continual Area Sweeping

Shah, Rishi, Jiang, Yuqian, Hart, Justin, Stone, Peter

arXiv.org Machine LearningMay-31-2020

Coverage path planning is a well-studied problem in robotics in which a robot must plan a path that passes through every point in a given area repeatedly, usually with a uniform frequency. To address the scenario in which some points need to be visited more frequently than others, this problem has been extended to non-uniform coverage planning. This paper considers the variant of non-uniform coverage in which the robot does not know the distribution of relevant events beforehand and must nevertheless learn to maximize the rate of detecting events of interest. This continual area sweeping problem has been previously formalized in a way that makes strong assumptions about the environment, and to date only a greedy approach has been proposed. We generalize the continual area sweeping formulation to include fewer environmental constraints, and propose a novel approach based on reinforcement learning in a Semi-Markov Decision Process. This approach is evaluated in an abstract simulation and in a high fidelity Gazebo simulation. These evaluations show significant improvement upon the existing approach in general settings, which is especially relevant in the growing area of service robotics.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2006.00589

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Average Reward Reinforcement Learning Using Constant Shifting Values

Yang, Shangdong (Nanjing University) | Gao, Yang (Nanjing University) | An, Bo (Nanyang Technological University) | Wang, Hao (Nanjing University) | Chen, Xingguo (Nanjing University of Posts and Telecommunications)

AAAI ConferencesApr-19-2016

There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones that explicitly maintain MDP models and model-free ones that do not learn such models. Though model-free algorithms are known to be more efficient, they often cannot converge to optimal policies due to the perturbation of parameters. In this paper, a novel model-free algorithm is proposed, which makes use of constant shifting values (CSVs) estimated from prior knowledge. To encourage exploration during the learning process, the algorithm constantly subtracts the CSV from the rewards. A terminating condition is proposed to handle the unboundedness of Q-values caused by such substraction. The convergence of the proposed algorithm is proved under very mild assumptions. Furthermore, linear function approximation is investigated to generalize our method to handle large-scale tasks. Extensive experiments on representative MDPs and the popular game Tetris show that the proposed algorithms significantly outperform the state-of-the-art ones.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: Asia > China > Jiangsu Province (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback