Policy Learning Using Weak Supervision

Dec-24-2025, 16:21:14 GMT–Neural Information Processing Systems

Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e.g., rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the available cheap weak supervisions to perform policy learning efficiently. To handle this problem, we treat the weak supervision'' as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a correlated agreement'' with the peer agent's policy (instead of simple agreements).

name change, policy learning, supervision, (7 more...)

Neural Information Processing Systems

Dec-24-2025, 16:21:14 GMT

Conferences Web Page

Add feedback

Industry:
- Education (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)