AITopics | Richardson, Sylvia

Collaborating Authors

Richardson, Sylvia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Federated Variational Inference for Bayesian Mixture Models

Rao, Jackie, Crowe, Francesca L., Marshall, Tom, Richardson, Sylvia, Kirk, Paul D. W.

arXiv.org Machine LearningFeb-18-2025

We present a federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets. We introduce a principled 'divide and conquer' inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by 'global' merge moves across batches to find global clustering structures. We show that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that our method performs well in comparison to existing clustering algorithms. We validate the practical utility of the method by applying it to large scale electronic health record (EHR) data.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

2502.12684

Country: Europe > United Kingdom > England (0.27)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Budd, Jobie, Baker, Kieran, Karoune, Emma, Coppock, Harry, Patel, Selina, Cañadas, Ana Tendero, Titcomb, Alexander, Payne, Richard, Hurley, David, Egglestone, Sabrina, Butler, Lorraine, Mellor, Jonathon, Nicholson, George, Kiskin, Ivan, Koutra, Vasiliki, Jersakova, Radka, McKendry, Rachel A., Diggle, Peter, Richardson, Sylvia, Schuller, Björn W., Gilmour, Steven, Pigoli, Davide, Roberts, Stephen, Packham, Josef, Thornley, Tracey, Holmes, Chris

arXiv.org Artificial IntelligenceNov-3-2023

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.

artificial intelligence, machine learning, participant, (17 more...)

arXiv.org Artificial Intelligence

2212.07738

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality (0.93)
Information Technology > Communications (0.93)

Add feedback

Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Coppock, Harry, Nicholson, George, Kiskin, Ivan, Koutra, Vasiliki, Baker, Kieran, Budd, Jobie, Payne, Richard, Karoune, Emma, Hurley, David, Titcomb, Alexander, Egglestone, Sabrina, Cañadas, Ana Tendero, Butler, Lorraine, Jersakova, Radka, Mellor, Jonathon, Patel, Selina, Thornley, Tracey, Diggle, Peter, Richardson, Sylvia, Packham, Josef, Schuller, Björn W., Pigoli, Davide, Gilmour, Steven, Roberts, Stephen, Holmes, Chris

arXiv.org Artificial IntelligenceMar-2-2023

Recent work has reported that respiratory audio-trained AI classifiers can accurately predict SARS-CoV-2 infection status. Here, we undertake a large-scale study of audio-based AI classifiers, as part of the UK government's pandemic response. We collect a dataset of audio recordings from 67,842 individuals, with linked metadata, of whom 23,514 had positive PCR tests for SARS-CoV-2. In an unadjusted analysis, similar to that in previous works, AI classifiers predict SARS-CoV-2 infection status with high accuracy (ROC-AUC=0.846 However, after matching on measured confounders, such as selfreported symptoms, performance is much weaker (ROC-AUC=0.619 Upon quantifying the utility of audio-based classifiers in practical settings, we find them to be outperformed by predictions based on user-reported symptoms. We make best-practice recommendations for handling recruitment bias, and for assessing audio-based classifiers by their utility in relevant practical settings. Our work provides novel insights into the value of AI audio analysis and the importance of study design and treatment of confounders in AI-enabled diagnostics. The coronavirus disease 2019 (COVID-19) pandemic has been estimated by the World Health Organization (WHO) to have caused 14.9 million excess deaths over the 2020-2021 period (link). Table S1 summarises nine highly cited datasets and corresponding classification performance. Here, we analyse the largest PCR-validated dataset collected to date in the field of audio-based COVID-19 screening (ABCS). We design and specify an analysis plan in advance, to investigate whether using audio-based classifiers can improve the accuracy of COVID-19 screening over using self-reported symptoms. Our contribution is as follows: - We collect a respiratory acoustic dataset of 67,842 individuals with linked PCR test outcomes, including 23,514 who tested positive for COVID-19.

classifier, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.0857

Country: Europe > United Kingdom > England (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.40)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.92)
(2 more...)

Add feedback

Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine

Kirk, Paul D. W., Pagani, Filippo, Richardson, Sylvia

arXiv.org Machine LearningMar-1-2023

Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide information on a diverse array of different biomolecular processes and pathways. Different groups of variables (e.g. genes or proteins) will be implicated in different biomolecular processes, and hence undertaking analyses that are limited to identifying just a single clustering partition of the whole dataset is therefore liable to conflate the multiple clustering structures that may arise from these distinct processes. To address this, we propose a multi-view Bayesian mixture model that identifies groups of variables (``views"), each of which defines a distinct clustering structure. We consider applications in stratified medicine, for which our principal goal is to identify clusters of patients that define distinct, clinically actionable disease subtypes. We adopt the semi-supervised, outcome-guided mixture modelling approach of Bayesian profile regression that makes use of a response variable in order to guide inference toward the clusterings that are most relevant in a stratified medicine context. We present the model, together with illustrative simulation examples, and examples from pan-cancer proteomics. We demonstrate how the approach can be used to perform integrative clustering, and consider an example in which different 'omics datasets are integrated in the context of breast cancer subtyping.

bioinformatics, machine learning, mixture model, (17 more...)

arXiv.org Machine Learning

2303.00318

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Biomedical Informatics (0.66)

Add feedback

Kernel learning approaches for summarising and combining posterior similarity matrices

Cabassi, Alessandra, Richardson, Sylvia, Kirk, Paul D. W.

arXiv.org Machine LearningSep-27-2020

Summary: When using Markov chain Monte Carlo (MCMC) algorithms to perform inference for Bayesian clustering models, such as mixture models, the output is typically a sample of clusterings (partitions) drawn from the posterior distribution. In practice, a key challenge is how to summarise this output. Here we build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices that capture the clustering structure present in the data. This observation enables us to employ a range of kernel methods to obtain summary clusterings, and otherwise exploit the information summarised by PSMs. For example, if we have multiple PSMs, each corresponding to a different dataset on a common set of statistical units, we may use standard methods for combining kernels in order to perform integrative clustering. We may moreover embed PSMs within predictive kernel models in order to perform outcome-guided data integration. We demonstrate the performances of the proposed methods through a range of simulation studies as well as two real data applications.

health & medicine, mass parameter number, oncology, (17 more...)

arXiv.org Machine Learning

2009.12852

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Distributed Bayesian Computation for Model Choice

Buchholz, Alexander, Ahfock, Daniel, Richardson, Sylvia

arXiv.org Machine LearningOct-10-2019

We propose a general method for distributed Bayesian model choice, where each worker has access only to non-overlapping subsets of the data. Our approach approximates the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The model evidences per worker are then consistently combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. This divide-and-conquer approach allows Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. Our work thereby complements the work on consensus Monte Carlo (Scott et al., 2016) by explicitly enabling model choice. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple models within one run.

approximation, bayesian inference, survey article, (21 more...)

arXiv.org Machine Learning

1910.04672

Country: Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Wang, Fan, Mukherjee, Sach, Richardson, Sylvia, Hill, Steven M.

arXiv.org Machine LearningAug-2-2018

Penalized likelihood methods are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different methods in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users of these methods. In this paper we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 1,800 data-generating scenarios, allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely-used methods (Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector as well as Stability Selection). We find considerable variation in performance between methods, with results dependent on details of the data-generating scenario and the specific goal. Our results support a `no panacea' view, with no unambiguous winner across all scenarios, even in this restricted setting where all data align well with the assumptions underlying the methods. Lasso is well-behaved, performing competitively in many scenarios, while SCAD is highly variable. Substantial benefits from a Ridge-penalty are only seen in the most challenging scenarios with strong multi-collinearity. The results are supported by semi-synthetic analyzes using gene expression data from cancer samples. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

oncology, scenario, survey article, (20 more...)

arXiv.org Machine Learning

1808.00723

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback