BELL: Benchmarking the Explainability of Large Language Models

Ahmed, Syed Quiser, Ganesh, Bharathi Vokkaliga, P, Jagadish Babu, Selvaraj, Karthick, Devi, ReddySiva Naga Parvathi, Kappala, Sravya

arXiv.org Artificial Intelligence 

Large language models have revolutionized natural language processing and generative Artificial Intelligence (AI), as shown by numerous foundational studies [1]. These models ' exceptional capabilities have attracted significant attention, enabling a wide range of applications. LLMs are utilized for tasks such as translation [2], content generation, content summarization, article writing [3], as well as enhancing search function s (Bing Chat [4]) etc., The impact of LLMs extends to fields like software develo pment, with models like Code Llama [5] aiding engineers . Their applications also span finance sector [6], scientific research [7] [8], including areas such as arts [9], education [10], oceanography [11], law [12], political science [13], medicine [14] [15], showcasing their broad and diverse influence. However, t he exponential rise in use of LLMs also brings challenges related to their explainability and interpretability.