FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking

Magomere, Jabez, Kochkina, Elena, Mensah, Samuel, Kaur, Simerjot, Smiley, Charese H.

Apr-24-2025–arXiv.org Artificial Intelligence

We introduce FinNLI, a benchmark dataset for Financial Natural Language Inference (FinNLI) across diverse financial texts like SEC Filings, Annual Reports, and Earnings Call transcripts. Our dataset framework ensures diverse premise-hypothesis pairs while minimizing spurious correlations. FinNLI comprises 21,304 pairs, including a high-quality test set of 3,304 instances annotated by finance experts. Evaluations show that domain shift significantly degrades general-domain NLI performance. The highest Macro F1 scores for pre-trained (PLMs) and large language models (LLMs) baselines are 74.57% and 78.62%, respectively, highlighting the dataset's difficulty. Surprisingly, instruction-tuned financial LLMs perform poorly, suggesting limited generalizability. FinNLI exposes weaknesses in current LLMs for financial reasoning, indicating room for improvement.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Apr-24-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Asia (1.00)
- North America > United States (0.88)

Genre:
- Financial News (1.00)
- Research Report
  - New Finding (0.46)
  - Experimental Study (0.46)

Industry:
- Banking & Finance > Trading (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found