Financial Instruction Following Evaluation (FIFE)

Matlin, Glenn, Siddharth, null, JM, Anirudh, Shukla, Aditya, Hassan, Yahya, Chava, Sudheer

Dec-11-2025–arXiv.org Artificial Intelligence

Language Models (LMs) struggle with complex, interdependent instructions, particularly in high-stakes domains like finance where precision is critical. We introduce FIFE, a novel, high-difficulty benchmark designed to assess LM instruction-following capabilities for financial analysis tasks. FIFE comprises 88 human-authored prompts and employs a verification system with chainable, verifiable constraints for fine-grained reward signals. We evaluate 53 models (proprietary, open-weight, open-source) in a zero-shot setting. Our key findings reveal a clear performance hierarchy: the top open-weight model (76.1 strict / 79.5 loose) surpasses the leading proprietary system (65.9 strict / 70.5 loose), while the best open-source models lag significantly (45.5 strict / 48.9 loose). However, even top-performing models struggle with FIFE's complex requirements, failing to achieve perfect compliance. We release our dataset and code as an open-source resource to promote research in Reinforcement Learning for the financial domain.

fin, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Dec-11-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Austria > Vienna (0.14)

Genre:
- Research Report (0.87)

Industry:
- Banking & Finance (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.75)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found