AITopics | general value function

Collaborating Authors

general value function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

6ba3af5d7b2790e73f0de32e5c8c1798-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 19:02:01 GMT

artificial intelligence, machine learning, value function, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.32)

Add feedback

Exploring through Random Curiosity with General Value Functions

Neural Information Processing SystemsDec-24-2025, 12:06:44 GMT

Efficient exploration in reinforcement learning is a challenging problem commonly addressed through intrinsic rewards. Recent prominent approaches are based on state novelty or variants of artificial curiosity. However, directly applying them to partially observable environments can be ineffective and lead to premature dissipation of intrinsic rewards. Here we propose random curiosity with general value functions (RC-GVF), a novel intrinsic reward function that draws upon connections between these distinct approaches. Instead of using only the current observation's novelty or a curiosity bonus for failing to predict precise environment dynamics, RC-GVF derives intrinsic rewards through predicting temporally extended general value functions. We demonstrate that this improves exploration in a hard-exploration diabolical lock problem. Furthermore, RC-GVF significantly outperforms previous methods in the absence of ground-truth episodic counts in the partially observable MiniGrid environments. Panoramic observations on MiniGrid further boost RC-GVF's performance such that it is competitive to baselines exploiting privileged information in form of episodic counts.

general value function, name change, random curiosity, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

We thank the reviewers for their constructive feedback and hope to clarify and address their concerns in this response

Neural Information Processing SystemsOct-3-2025, 04:12:53 GMT

We thank the reviewers for their constructive feedback and hope to clarify and address their concerns in this response. UVF As may help with more complex settings. We will add this explanation in the paper. Note that Assump 1 does not require binary rewards in terminal states (also see discussion after Assump 1). "stay", such that a goal position only becomes terminal if the agent chooses to stay in it.

artificial intelligence, machine learning, value function, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.32)

Add feedback

76e57c3c6b3e06f332a4832ddd6a9a12-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 01:33:53 GMT

machine learning, rc-gvf, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah (0.04)
North America > Barbados (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Contextual Multinomial Logit Bandits with General Value Functions

Neural Information Processing SystemsMay-26-2025, 21:58:06 GMT

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.

artificial intelligence, contextual multinomial logit bandit, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Exploring through Random Curiosity with General Value Functions

Neural Information Processing SystemsJan-13-2025, 14:10:32 GMT

general value function, intrinsic reward, random curiosity, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Hierarchical Universal Value Function Approximators

Arora, Rushiv

arXiv.org Machine LearningOct-27-2024

There have been key advancements to building universal approximators for multi-goal collections of reinforcement learning value functions -- key elements in estimating long-term returns of states in a parameterized manner. We extend this to hierarchical reinforcement learning, using the options framework, by introducing hierarchical universal value function approximators (H-UVFAs). This allows us to leverage the added benefits of scaling, planning, and generalization expected in temporal abstraction settings. We develop supervised and reinforcement learning methods for learning embeddings of the states, goals, options, and actions in the two hierarchical value functions: $Q(s, g, o; \theta)$ and $Q(s, g, o, a; \theta)$. Finally we demonstrate generalization of the HUVFAs and show they outperform corresponding UVFAs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2410.08997

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using General Value Functions to Learn Domain-Backed Inventory Management Policies

Kalwar, Durgesh, Shelke, Omkar, Khadilkar, Harshad

arXiv.org Artificial IntelligenceNov-3-2023

We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store. We propose a reinforcement learning (RL) approach that utilises General Value Functions (GVFs) to derive domain-backed inventory replenishment policies. The inventory replenishment decisions are modelled as a sequential decision making problem, which is challenging due to uncertain demand and the existence of aggregate (cross-product) constraints. In existing literature, GVFs have primarily been used for auxiliary task learning. We use this capability to train GVFs on domain-critical characteristics such as prediction of stock-out probability and wastage quantity. Using this domain expertise for more effective exploration, we train an RL agent to compute the inventory replenishment quantities for a large range of products (up to 6000 in the reported experiments), which share aggregate constraints such as the total weight/volume per delivery. Additionally, we show that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent. Finally, since the environment dynamics are fully transferred, the trained GVFs can be used for faster adaptation to vastly different business objectives (for example, due to the start of a promotional period or due to deployment in a new customer environment).

algorithm, constraint, inventory level, (13 more...)

arXiv.org Artificial Intelligence

2311.02125

Country: Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Kalwar, Durgesh, Shelke, Omkar, Nath, Somjit, Meisheri, Hardik, Khadilkar, Harshad

arXiv.org Artificial IntelligenceFeb-27-2023

Improving sample efficiency is a key challenge in reinforcement learning, especially in environments with large state spaces and sparse rewards. In literature, this is resolved either through the use of auxiliary tasks (subgoals) or through clever exploration strategies. Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorporated where the reward is sparse. However, few studies have attempted to tackle both large scale and reward sparsity at the same time. This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We present a way to learn value functions which can be used to sample actions and provide directed exploration. Experiments on navigation tasks with varying grid sizes demonstrate the performance advantages over several competitive baselines.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2203.00874

Country:

North America > United States (0.28)
Asia > India (0.15)
North America > Canada (0.15)
(4 more...)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Affordance as general value function: A computational model

Graves, Daniel, Günther, Johannes, Luo, Jun

arXiv.org Artificial IntelligenceOct-27-2020

General value functions (GVFs) in the reinforcement learning (RL) literature are long-term predictive summaries of the outcomes of agents following specific policies in the environment. Affordances as perceived valences of action possibilities may be cast into predicted policy-relative goodness and modelled as GVFs. A systematic explication of this connection shows that GVFs and especially their deep learning embodiments (1) realize affordance prediction as a form of direct perception, (2) illuminate the fundamental connection between action and perception in affordance, and (3) offer a scalable way to learn affordances using RL methods. Through a comprehensive review of existing literature on recent successes of GVF applications in robotics, rehabilitation, industrial automation, and autonomous driving, we demonstrate that GVFs provide the right framework for learning affordances in real-world applications. In addition, we highlight a few new avenues of research opened up by the perspective of "affordance as GVF", including using GVFs for orchestrating complex behaviors.

affordance, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2010.14289

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta (0.14)
Europe > Austria > Vienna (0.14)
(8 more...)

Genre: Overview (0.87)

Industry:

Leisure & Entertainment > Games (1.00)
Leisure & Entertainment > Sports (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback