AITopics | varp

Collaborating Authors

varp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementto" Sample-EfficientReinforcement LearningforLinearly-ParameterizedMDPs withaGenerativeModel "

Neural Information Processing SystemsFeb-11-2026, 00:46:26 GMT

In addition, we define1 to be a vector with all the entries being 1, andI be the identity matrix. Suppose thatδ > 0andε (0,(1 γ) 1/2]. The remainder of this section is devotedtoprovingTheorem3. VT) to be the policy (resp. The remainder of this section is devotedtoprovingTheorem4.

artificial intelligence, machine learning, varp, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences

Singh, Anukriti, Bhaskar, Amisha, Yu, Peihong, Chakraborty, Souradip, Dasyam, Ruthwik, Bedi, Amrit, Tokekar, Pratap

arXiv.org Artificial IntelligenceMar-17-2025

Designing reward functions for continuous-control robotics often leads to subtle misalignments or reward hacking, especially in complex tasks. Preference-based RL mitigates some of these pitfalls by learning rewards from comparative feedback rather than hand-crafted signals, yet scaling human annotations remains challenging. Recent work uses Vision-Language Models (VLMs) to automate preference labeling, but a single final-state image generally fails to capture the agent's full motion. In this paper, we present a two-part solution that both improves feedback accuracy and better aligns reward learning with the agent's policy. First, we overlay trajectory sketches on final observations to reveal the path taken, allowing VLMs to provide more reliable preferences-improving preference accuracy by approximately 15-20% in metaworld tasks. Second, we regularize reward learning by incorporating the agent's performance, ensuring that the reward model is optimized based on data generated by the current policy; this addition boosts episode returns by 20-30% in locomotion tasks. Empirical studies on metaworld demonstrate that our method achieves, for instance, around 70-80% success rate in all tasks, compared to below 50% for standard approaches. These results underscore the efficacy of combining richer visual representations with agent-aware reward regularization.

machine learning, natural language, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2503.13817

Country:

North America > United States > Montana (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback