Goto

Collaborating Authors

 Ulster


Knowledge-based Consistency Testing of Large Language Models

Rajan, Sai Sathiesh, Soremekun, Ezekiel, Chattopadhyay, Sudipta

arXiv.org Artificial Intelligence

In this work, we systematically expose and measure the inconsistency and knowledge gaps of Large Language Models (LLMs). Specifically, we propose an automated testing framework (called KONTEST) which leverages a knowledge graph to construct test cases. KONTEST probes and measures the inconsistencies in the LLM's knowledge of the world via a combination of semantically-equivalent queries and test oracles (metamorphic or ontological oracle). KONTEST further mitigates knowledge gaps via a weighted LLM model ensemble. Using four state-of-the-art LLMs (Falcon, Gemini, GPT3.5, and Llama2), we show that KONTEST generates 19.2% error inducing inputs (1917 errors from 9983 test inputs). It also reveals a 16.5% knowledge gap across all tested LLMs. KONTEST's mitigation method reduces LLM knowledge gap by 32.48%. Our ablation study further shows that GPT3.5 is not suitable for knowledge-based consistency testing because it is only 60%-68% effective in knowledge construction.


Structsum Generation for Faster Text Comprehension

Jain, Parag, Marzoca, Andreea, Piccinno, Francesco

arXiv.org Artificial Intelligence

We consider the task of generating structured representations of text using large language models (LLMs). We focus on tables and mind maps as representative modalities. Tables are more organized way of representing data, while mind maps provide a visually dynamic and flexible approach, particularly suitable for sparse content. Despite the effectiveness of LLMs on different tasks, we show that current models struggle with generating structured outputs. In response, we present effective prompting strategies for both of these tasks. We introduce a taxonomy of problems around factuality, global and local structure, common to both modalities and propose a set of critiques to tackle these issues resulting in an absolute improvement in accuracy of +37pp (79%) for mind maps and +15pp (78%) for tables. To evaluate semantic coverage of generated structured representations we propose Auto-QA, and we verify the adequacy of Auto-QA using SQuAD dataset. We further evaluate the usefulness of structured representations via a text comprehension user study. The results show a significant reduction in comprehension time compared to text when using table (42.9%) and mind map (31.9%), without loss in accuracy.


Report on the Eighth Ireland Conference on AI and Cognitive Science

McKevitt, Paul

AI Magazine

It is a northern European city of 100,000, almost on the border between the Republic of Ireland and Northern Ireland. The local press (The Derry Journal north Derry coast, with beautiful meetings enjoyed themselves and & Belfast Telegraph) and radio (BBC beaches at Benone and Castlenock expressed their congratulations on Northern Ireland) ran a number of and then through Coleraine to the the program and organization. Also, articles leading up to and during the seaside resorts of Portstewart and for the first time, AICS attracted a conference. All plenary invited speaker Portrush. A few kilometers further large number of delegates and papers talks and the panel session went out along the north Antrim coast, we from abroad, including many from on streaming video and audio, stored arrive at the Giants' Causeway and the United Kingdom, Europe, and Sauce!); Gweedore, home of the Clannad and live with the possibility of phonein for Pattern Recognition (IAPR), the They did lie in the areas of evidential reasoning, AICS-97, the Annual Conference of a supreme job.