A Additional Results
–Neural Information Processing Systems
The performance analysis of other LLMs on the acronym and regulations tasks, as shown in Tables 1 and 2, provides valuable insights into their capabilities. The acronym dataset is a QA task that requires models to decode financial acronyms. Despite not having seen this task before, FinMA, a financial LLM specially trained on financial tasks, performed exceptionally well. The FinMA7B-full model achieved the highest ROUGE-1 score of 0.12 and the highest BERTScore of 0.73, even surpassing GPT-4. This indicates that financial-specific models can leverage their domain knowledge effectively, even on short QA tasks like the acronym dataset. On the other hand, the regulations dataset involves answering intricate questions related to financial regulations, such as EMIR. This task is long, complex, and difficult to understand, posing a significant challenge. In this scenario, the LLaMA2-70b-chat model stand out with a ROUGE-1 score of 0.30 and a BERTScore of 0.68, highlighting its ability to handle complex regulatory questions. This underscores the importance of model size and capability when dealing with more demanding and sophisticated tasks in the financial domain. The best performance is in bold. The best performance is in bold. B.1 Why was the datasheet created? FinBen was created to address the gap in comprehensive benchmarks and evaluation studies of large language models within the financial domain. Despite the proven capabilities of LLMs such as GPT-4 in transforming various fields including finance, a detailed understanding of their potential and limitations specific to finance is still lacking. This is partly due to the complex and specialized nature of financial tasks, which necessitates targeted datasets for thorough analysis. By evaluating 42 datasets covering 24 financial tasks, we aim to provide a robust benchmark that allows researchers and practitioners to evaluate the effectiveness of LLMs in financial text analysis and prediction tasks more accurately and reliably.
Neural Information Processing Systems
Jun-1-2025, 00:28:23 GMT
- Country:
- Asia > China (0.28)
- North America (0.46)
- Genre:
- Research Report (0.93)
- Industry:
- Banking & Finance > Trading (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Technology: