AITopics | adversarial soft advantage fitting

Collaborating Authors

adversarial soft advantage fitting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Neural Information Processing SystemsDec-24-2025, 07:10:05 GMT

Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and computational burden of Adversarial Imitation Learning algorithms by removing the Reinforcement Learning phase altogether. We show on a variety of tasks that our simpler approach is competitive to prevalent Imitation Learning methods.

adversarial soft advantage fitting, discriminator, imitation learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization Paul Barde

Neural Information Processing SystemsAug-15-2025, 03:08:36 GMT

Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy.

discriminator, international conference, learning, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.15)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Neural Information Processing SystemsJan-26-2025, 14:02:07 GMT

Correctness: The claims and experiments seem mostly correct. While the analysis shows that the solution to the min-max problem (Eq. I would increase my review if the paper were updated to include a proof that the proposed algorithm converges. One comment about the experiments is that they don't actually show that the proposed method mimics the expert, only that running the proposed algorithm with data generated from an expert results in high reward. I would increase my review if an experiment were added to show that the learned policy actually mimics the demonstrator.

adversarial soft advantage fitting, experiment, policy optimization, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Review for NeurIPS paper: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Neural Information Processing SystemsJan-26-2025, 14:02:00 GMT

Even before the author response, the reviewers agreed that the results and approach were interesting. The response addressed the reviewers remaining concerns about novelty, baseline strength, and positioning with respect to prior work. This led the reviewers to a consensus that the paper should be accepted.

adversarial soft advantage fitting, imitation learning, policy optimization, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Neural Information Processing SystemsOct-10-2024, 18:44:28 GMT

adversarial soft advantage fitting, discriminator, imitation learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Barde, Paul, Roy, Julien, Jeon, Wonseok, Pineau, Joelle, Pal, Christopher, Nowrouzezahrai, Derek

arXiv.org Artificial IntelligenceJun-23-2020

Adversarial imitation learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and computational burden of adversarial imitation learning algorithms by removing the reinforcement learning phase altogether. We show on a variety of tasks that our simpler approach is competitive to prevalent imitation learning methods.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2006.13258

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback