Evaluating the Accuracy of Chatbots in Financial Literature

Erdem, Orhan, Hassett, Kristi, Egriboyun, Feyzullah

Nov-11-2024–arXiv.org Artificial Intelligence

We evaluate the reliability of two chatbots, ChatGPT (4o and o1-preview versions), and Gemini Advanced, in providing references on financial literature and employing novel methodologies. Alongside the conventional binary approach commonly used in the literature, we developed a nonbinary approach and a recency measure to assess how hallucination rates vary with how recent a topic is. After analyzing 150 citations, ChatGPT-4o had a hallucination rate of 20.0% (95% CI, 13.6%-26.4%), while the o1-preview had a hallucination rate of 21.3% (95% CI, 14.8%-27.9%). In contrast, Gemini Advanced exhibited higher hallucination rates: 76.7% (95% CI, 69.9%-83.4%). While hallucination rates increased for more recent topics, this trend was not statistically significant for Gemini Advanced. These findings emphasize the importance of verifying chatbot-provided references, particularly in rapidly evolving fields.

chatbot, chatgpt-4o, hallucination rate, (12 more...)

arXiv.org Artificial Intelligence

Nov-11-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.04)
- North America > United States
  - Texas > Denton County > Denton (0.04)

Genre:
- Research Report > Experimental Study > Negative Result (0.54)

Industry:
- Health & Medicine (1.00)
- Law (0.68)
- Banking & Finance > Trading (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found