Procter, Rob
Machine Learning Information Retrieval and Summarisation to Support Systematic Review on Outcomes Based Contracting
Bilal, Iman Munire, Fang, Zheng, Arana-Catania, Miguel, van Lier, Felix-Anselm, Velarde, Juliana Outes, Bregazzi, Harry, Carter, Eleanor, Airoldi, Mara, Procter, Rob
As academic literature proliferates, traditional review methods are increasingly challenged by the sheer volume and diversity of available research. This article presents a study that aims to address these challenges by enhancing the efficiency and scope of systematic reviews in the social sciences through advanced machine learning (ML) and natural language processing (NLP) tools. In particular, we focus on automating stages within the systematic reviewing process that are time-intensive and repetitive for human annotators and which lend themselves to immediate scalability through tools such as information retrieval and summarisation guided by expert advice. The article concludes with a summary of lessons learnt regarding the integrated approach towards systematic reviews and future directions for improvement, including explainability.
SyROCCo: Enhancing Systematic Reviews using Machine Learning
Fang, Zheng, Arana-Catania, Miguel, van Lier, Felix-Anselm, Velarde, Juliana Outes, Bregazzi, Harry, Airoldi, Mara, Carter, Eleanor, Procter, Rob
The sheer number of research outputs published every year makes systematic reviewing increasingly time- and resource-intensive. This paper explores the use of machine learning techniques to help navigate the systematic review process. ML has previously been used to reliably 'screen' articles for review - that is, identify relevant articles based on reviewers' inclusion criteria. The application of ML techniques to subsequent stages of a review, however, such as data extraction and evidence mapping, is in its infancy. We therefore set out to develop a series of tools that would assist in the profiling and analysis of 1,952 publications on the theme of 'outcomes-based contracting'. Tools were developed for the following tasks: assign publications into 'policy area' categories; identify and extract key information for evidence mapping, such as organisations, laws, and geographical information; connect the evidence base to an existing dataset on the same topic; and identify subgroups of articles that may share thematic content. An interactive tool using these techniques and a public dataset with their outputs have been released. Our results demonstrate the utility of ML techniques to enhance evidence accessibility and analysis within the systematic review processes. These efforts show promise in potentially yielding substantial efficiencies for future systematic reviewing and for broadening their analytical scope. Our work suggests that there may be implications for the ease with which policymakers and practitioners can access evidence. While ML techniques seem poised to play a significant role in bridging the gap between research and policy by offering innovative ways of gathering, accessing, and analysing data from systematic reviews, we also highlight their current limitations and the need to exercise caution in their application, particularly given the potential for errors and biases.
Multi-Layer Ranking with Large Language Models for News Source Recommendation
Zhang, Wenjia, Gui, Lin, Procter, Rob, He, Yulan
To seek reliable information sources for news events, we introduce a novel task of expert recommendation, which aims to identify trustworthy sources based on their previously quoted statements. To achieve this, we built a novel dataset, called NewsQuote, consisting of 23,571 quote-speaker pairs sourced from a collection of news articles. We formulate the recommendation task as the retrieval of experts based on their likelihood of being associated with a given query. We also propose a multi-layer ranking framework employing Large Language Models to improve the recommendation performance. Our results show that employing an in-context learning based LLM ranker and a multi-layer ranking-based filter significantly improve both the predictive quality and behavioural quality of the recommender system.
Generating Unsupervised Abstractive Explanations for Rumour Verification
Bilal, Iman Munire, Nakov, Preslav, Procter, Rob, Liakata, Maria
The task of rumour verification in social media concerns assessing the veracity of a claim on the basis of conversation threads that result from it. While previous work has focused on predicting a veracity label, here we reformulate the task to generate model-centric, free-text explanations of a rumour's veracity. We follow an unsupervised approach by first utilising post-hoc explainability methods to score the most important posts within a thread and then we use these posts to generate informative explanatory summaries by employing template-guided summarisation. To evaluate the informativeness of the explanatory summaries, we exploit the few-shot learning capabilities of a large language model (LLM). Our experiments show that LLMs can have similar agreement to humans in evaluating summaries. Importantly, we show that explanatory abstractive summaries are more informative and better reflect the predicted rumour veracity than just using the highest ranking posts in the thread.
BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling
Fang, Zheng, He, Yulan, Procter, Rob
With the development of neural topic models in recent years, topic modelling is playing an increasingly important role in natural language understanding. However, most existing topic models still rely on bag-of-words (BoW) information, either as training input or training target. This limits their ability to capture word order information in documents and causes them to suffer from the out-of-vocabulary (OOV) issue, i.e. they cannot handle unobserved words in new documents. Contextualized word embeddings from pre-trained language models show superiority in the ability of word sense disambiguation and prove to be effective in dealing with OOV words. In this work, we developed a novel neural topic model combining contextualized word embeddings from the pre-trained language model BERT. The model can infer the topic distribution of a document without using any BoW information. In addition, the model can infer the topic distribution of each word in a document directly from the contextualized word embeddings. Experiments on several datasets show that our model outperforms existing topic models in terms of both document classification and topic coherence metrics and can accommodate unseen words from newly arrived documents. Experiments on the NER dataset also show that our model can produce high-quality word topic representations.
NewsQuote: A Dataset Built on Quote Extraction and Attribution for Expert Recommendation in Fact-Checking
Zhang, Wenjia, Gui, Lin, Procter, Rob, He, Yulan
To enhance the ability to find credible evidence in news articles, we propose a novel task of expert recommendation, which aims to identify trustworthy experts on a specific news topic. To achieve the aim, we describe the construction of a novel NewsQuote dataset consisting of 24,031 quote-speaker pairs that appeared on a COVID-19 news corpus. We demonstrate an automatic pipeline for speaker and quote extraction via a BERT-based Question Answering model. Then, we formulate expert recommendations as document retrieval task by retrieving relevant quotes first as an intermediate step for expert identification, and expert retrieval by directly retrieving sources based on the probability of a query conditional on a candidate expert. Experimental results on NewsQuote show that document retrieval is more effective in identifying relevant experts for a given news topic compared to expert retrieval
A User-Centered, Interactive, Human-in-the-Loop Topic Modelling System
Fang, Zheng, Alqazlan, Lama, Liu, Du, He, Yulan, Procter, Rob
While Huge amounts of unstructured, textual data are most of these studies did not feed the refinement operations generated daily. As more data becomes available, into an iterative retraining process, Smith it becomes more difficult to search, understand et al. (2018) implemented a fully interactive, usercentered and discover the knowledge within it. Because of HL-TM system, and examined how the the human effort it requires, conventional qualitative user experience is affected by issues arising in interactive approaches, such as Grounded Theory, (Glaser systems, such as unpredictability, trust and et al., 1968) are no longer feasible with such large lack of control. However, there are still limitations volumes of data. Topic modelling is a potential to their work. First, their system only allows users solution that has received increasing attention in to refine the model sequentially, meaning that once recent research (Heidenreich et al., 2019; Curiskis a user updates the model, a new model overrides et al., 2020; Dantu et al., 2021; Goyal and Howlett, the previous model. This prevents users from comparing 2021) to help users organize, search, and understand the effects of applying different refinement large amounts of information. It is an unsupervised operations to the same model, making it difficult machine learning technique for identifying to find the most appropriate ones.
PANACEA: An Automated Misinformation Detection System on COVID-19
Zhao, Runcong, Arana-Catania, Miguel, Zhu, Lixing, Kochkina, Elena, Gui, Lin, Zubiaga, Arkaitz, Procter, Rob, Liakata, Maria, He, Yulan
In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporting evidence with the stance towards the claim to be checked. In addition, PANACEA adapts the bi-directional graph convolutional networks model, which is able to detect rumours based on comment networks of related tweets, instead of relying on the knowledge base. This rumour detection module assists by warning the users in the early stages when a knowledge base may not be available.
A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering
Maufe, Matthew, Ravenscroft, James, Procter, Rob, Liakata, Maria
Question Answering (QA) is a growing area of research, often used to facilitate the extraction of information from within documents. State-of-the-art QA models are usually pre-trained on domain-general corpora like Wikipedia and thus tend to struggle on out-of-domain documents without fine-tuning. We demonstrate that synthetic domain-specific datasets can be generated easily using domain-general models, while still providing significant improvements to QA performance. We present two new tools for this task: A flexible pipeline for validating the synthetic QA data and training downstream models on it, and an online interface to facilitate human annotation of this generated data. Using this interface, crowdworkers labelled 1117 synthetic QA pairs, which we then used to fine-tune downstream models and improve domain-specific QA performance by 8.75 F1.
Holding AI to Account: Challenges for the Delivery of Trustworthy AI in Healthcare
Procter, Rob, Tolmie, Peter, Rouncefield, Mark
The need for AI systems to provide explanations for their behaviour is now widely recognised as key to their adoption. In this paper, we examine the problem of trustworthy AI and explore what delivering this means in practice, with a focus on healthcare applications. Work in this area typically treats trustworthy AI as a problem of Human-Computer Interaction involving the individual user and an AI system. However, we argue here that this overlooks the important part played by organisational accountability in how people reason about and trust AI in socio-technical settings. To illustrate the importance of organisational accountability, we present findings from ethnographic studies of breast cancer screening and cancer treatment planning in multidisciplinary team meetings to show how participants made themselves accountable both to each other and to the organisations of which they are members. We use these findings to enrich existing understandings of the requirements for trustworthy AI and to outline some candidate solutions to the problems of making AI accountable both to individual users and organisationally. We conclude by outlining the implications of this for future work on the development of trustworthy AI, including ways in which our proposed solutions may be re-used in different application settings.