Evaluating ChatGPT on Nuclear Domain-Specific Data
Anwar, Muhammad, de Costa, Mischa, Hammad, Issam, Lau, Daniel
–arXiv.org Artificial Intelligence
This paper examines the application of ChatGPT, a large language model (LLM), for question-and-answer (Q&A) tasks in the highly specialized field of nuclear data. The primary focus is on evaluating ChatGPT's performance on a curated test dataset, comparing the outcomes of a standalone LLM with those generated through a Retrieval Augmented Generation (RAG) approach. LLMs, despite their recent advancements, are prone to generating incorrect or 'hallucinated' information, which is a significant limitation in applications requiring high accuracy and reliability. This study explores the potential of utilizing RAG in LLMs, a method that integrates external knowledge bases and sophisticated retrieval techniques to enhance the accuracy and relevance of generated outputs. In this context, the paper evaluates ChatGPT's ability to answer domain-specific questions, employing two methodologies: A) direct response from the LLM, and B) response from the LLM within a RAG framework. The effectiveness of these methods is assessed through a dual mechanism of human and LLM evaluation, scoring the responses for correctness and other metrics. The findings underscore the improvement in performance when incorporating a RAG pipeline in an LLM, particularly in generating more accurate and contextually appropriate responses for nuclear domain-specific queries. Additionally, the paper highlights alternative approaches to further refine and improve the quality of answers in such specialized domains.
arXiv.org Artificial Intelligence
Aug-26-2024
- Country:
- North America > Canada
- Saskatchewan > Saskatoon (0.09)
- Ontario (0.04)
- Nova Scotia > Halifax Regional Municipality
- Halifax (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > Canada
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.66)
- Industry:
- Energy > Power Industry > Utilities > Nuclear (1.00)
- Technology: