AITopics | igl

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Neural Information Processing SystemsMar-22-2026, 16:38:01 GMT

Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions.To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on this estimator, we propose two algorithms, one based on explore-then-exploit and the other based on inverse-gap weighting. We apply IGL to learning from image feedback and learning from text feedback, which are reward-free settings that arise in practice.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

512b6bc067a6c6fa6a6ff8e5f6445e10-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 23:00:50 GMT

igl, information, latent reward, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Interaction-Grounded Learning with Action-Inclusive Feedback

Neural Information Processing SystemsDec-24-2025, 05:11:56 GMT

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL's success in many potential scenarios such as Brain-computer interface (BCI) or Human-computer interface (HCI) applications. We address this by creating an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion. We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.

action-inclusive feedback, interaction-grounded learning, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

Interaction-Grounded Learning with Action-Inclusive Feedback

Neural Information Processing SystemsAug-14-2025, 20:03:55 GMT

Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL's success in many potential scenarios such as Brain-computer interface (BCI) or Human-computer interface (HCI) applications.

igl, latent reward, learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Neural Information Processing SystemsMay-27-2025, 19:03:55 GMT

Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions.To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on this estimator, we propose two algorithms, one based on explore-then-exploit and the other based on inverse-gap weighting.

interactive-grounded learning, personalized reward, provably efficient interactive-grounded learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.43)

Add feedback

Interaction-Grounded Learning with Action-Inclusive Feedback

Neural Information Processing SystemsOct-11-2024, 01:22:39 GMT

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL's success in many potential scenarios such as Brain-computer interface (BCI) or Human-computer interface (HCI) applications. We address this by creating an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion. We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.

action-inclusive feedback, feedback vector, interaction-grounded learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.45)

Add feedback

Interaction-Grounded Learning

Xie, Tengyang, Langford, John, Mineiro, Paul, Momennejad, Ida

arXiv.org Machine LearningJun-9-2021

Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vector, takes an action, and then observes a multidimensional feedback vector. This multidimensional feedback vector has no explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach.

assumption, feedback vector, interaction-grounded learning, (14 more...)

arXiv.org Machine Learning

2106.04887

Country:

North America > United States > Illinois (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Improving Heuristics Through Relaxed Search - An Analysis of TP4 and HSP*a in the 2004 Planning Competition

Haslum, P.

Journal of Artificial Intelligence ResearchFeb-25-2006

The hm admissible heuristics for (sequential and temporal) regression planning are defined by a parameterized relaxation of the optimal cost function in the regression search space, where the parameter m offers a trade-off between the accuracy and computational cost of theheuristic. Existing methods for computing the hm heuristic require time exponential in m, limiting them to small values (m <= 2). The hm heuristic can also be viewed as the optimal cost function in a relaxation of the search space: this paper presents relaxed search, a method for computing this function partially by searching in the relaxed space. The relaxed search method, because it computes hm only partially, is computationally cheaper and therefore usable for higher values of m. The (complete) hm heuristic is combined with partial hm heuristics, for m = 3,..., computed by relaxed search, resulting in a more accurate heuristic. This use of the relaxed search method to improve on the hm heuristic is evaluated by comparing two optimal temporal planners: TP4, which does not use it, and HSP*a, which uses it but is otherwise identical to TP4. The comparison is made on the domains used in the 2004 International Planning Competition, in which both planners participated. Relaxed search is found to be cost effective in some of these domains, but not all. Analysis reveals a characterization of the domains in which relaxed search can be expected to be cost effective, in terms of two measures on the original and relaxed search spaces. In the domains where relaxed search is cost effective, expanding small states is computationally cheaper than expanding large states and small states tend to have small successor states.

eq57, rx6, rx6h, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1885

AI Access Foundation

10443

Journal of Artificial Intelligence Research

Country: