FM-IRL: Flow-Matching for Reward Modeling and Policy Regularization in Reinforcement Learning

Open in new window