Policy Improvement using Language Feedback Models
–Neural Information Processing Systems
First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, imitation learning using LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens.
Neural Information Processing Systems
Nov-17-2025, 16:35:44 GMT
- Country:
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education > Educational Setting (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.94)
- Performance Analysis > Accuracy (0.67)
- Natural Language > Large Language Model (1.00)
- Robots (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence