AITopics | pbrl algorithm

Collaborating Authors

pbrl algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d9d3837ee7981e8c064774da6cdd98bf-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 16:30:50 GMT

Our technical contribution is mainly two-fold.

artificial intelligence, assumption, probability, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.31)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

Add feedback

d9d3837ee7981e8c064774da6cdd98bf-AuthorFeedback.pdf

Neural Information Processing SystemsAug-16-2025, 18:01:09 GMT

artificial intelligence, assumption, probability, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.30)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Add feedback

Advances in Preference-based Reinforcement Learning: A Review

Abdelkareem, Youssef, Shehata, Shady, Karray, Fakhri

arXiv.org Artificial IntelligenceAug-21-2024

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by utilizing human preferences as feedback from the experts instead of numeric rewards. Due to its promising advantage over traditional RL, PbRL has gained more focus in recent years with many significant advances. In this survey, we present a unified PbRL framework to include the newly emerging approaches that improve the scalability and efficiency of PbRL. In addition, we give a detailed overview of the theoretical guarantees and benchmarking work done in the field, while presenting its recent applications in complex real-world tasks. Lastly, we go over the limitations of the current approaches and the proposed future research directions.

algorithm, pbrl algorithm, utility function, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/SMC53654.2022.9945333

2408.11943

Country:

North America > United States (0.14)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Slovenia > Upper Carniola > Municipality of Bled > Bled (0.04)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

Muslimani, Calarina, Grooten, Bram, Mamillapalli, Deepak Ranganatha Sastry, Pechenizkiy, Mykola, Mocanu, Decebal Constantin, Taylor, Matthew E.

arXiv.org Artificial IntelligenceJun-10-2024

For autonomous agents to successfully integrate into human-centered environments, agents should be able to learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) is a promising approach that learns reward functions from human preferences. This enables RL agents to adapt their behavior based on human desires. However, humans live in a world full of diverse information, most of which is not relevant to completing a particular task. It becomes essential that agents learn to focus on the subset of task-relevant environment features. Unfortunately, prior work has largely ignored this aspect; primarily focusing on improving PbRL algorithms in standard RL environments that are carefully constructed to contain only task-relevant features. This can result in algorithms that may not effectively transfer to a more noisy real-world setting. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. We study the effectiveness of R2N in the Extremely Noisy Environment setting, an RL problem setting where up to 95% of the state features are irrelevant distractions. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several state-of-the-art PbRL algorithms in multiple locomotion and control environments.

pebble, timestep, true return true return, (13 more...)

arXiv.org Artificial Intelligence

2406.06495

Country:

North America > Canada > Alberta (0.14)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Benchmarking Multi-Agent Preference-based Reinforcement Learning for Human-AI Teaming

Bhambri, Siddhant, Verma, Mudit, Murthy, Anil, Kambhampati, Subbarao

arXiv.org Artificial IntelligenceDec-21-2023

Preference-based Reinforcement Learning (PbRL) is an active area of research, and has made significant strides in single-agent actor and in observer human-in-the-loop scenarios. However, its application within the co-operative multi-agent RL frameworks, where humans actively participate and express preferences for agent behavior, remains largely uncharted. We consider a two-agent (Human-AI) cooperative setup where both the agents are rewarded according to human's reward function for the team. However, the agent does not have access to it, and instead, utilizes preference-based queries to elicit its objectives and human's preferences for the robot in the human-robot team. We introduce the notion of Human-Flexibility, i.e. whether the human partner is amenable to multiple team strategies, with a special case being Specified Orchestration where the human has a single team policy in mind (most constrained case). We propose a suite of domains to study PbRL for Human-AI cooperative setup which explicitly require forced cooperation. Adapting state-of-the-art single-agent PbRL algorithms to our two-agent setting, we conduct a comprehensive benchmarking study across our domain suite. Our findings highlight the challenges associated with high degree of Human-Flexibility and the limited access to the human's envisioned policy in PbRL for Human-AI cooperation. Notably, we observe that PbRL algorithms exhibit effective performance exclusively in the case of Specified Orchestration which can be seen as an upper bound PbRL performance for future research.

agent, ai agent, human agent, (11 more...)

arXiv.org Artificial Intelligence

2312.14292

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
South America > Brazil > São Paulo (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback