Goto

Collaborating Authors

 patterson


Fast Conformal Prediction using Conditional Interquantile Intervals

Guo, Naixin, Luo, Rui, Zhou, Zhixin

arXiv.org Machine Learning

We introduce Conformal Interquantile Regression (CIR), a conformal regression method that efficiently constructs near-minimal prediction intervals with guaranteed coverage. CIR leverages black-box machine learning models to estimate outcome distributions through interquantile ranges, transforming these estimates into compact prediction intervals while achieving approximate conditional coverage. We further propose CIR+ (Conditional Interquantile Regression with More Comparison), which enhances CIR by incorporating a width-based selection rule for interquantile intervals. This refinement yields narrower prediction intervals while maintaining comparable coverage, though at the cost of slightly increased computational time. Both methods address key limitations of existing distributional conformal prediction approaches: they handle skewed distributions more effectively than Con-formalized Quantile Regression, and they achieve substantially higher computational efficiency than Conformal Histogram Regression by eliminating the need for histogram construction. Extensive experiments on synthetic and real-world datasets demonstrate that our methods optimally balance predictive accuracy and computational efficiency compared to existing approaches.


Fujitsu 'not a parasite' over Horizon scandal

BBC News

Fujitsu is not a parasite for continuing to profit from government contracts in the wake of the Post Office Horizon scandal, its boss told MPs. European chief executive Paul Patterson said Fujitsu had been given £500m of contract extensions despite its faulty software being at the centre of the huge miscarriage of justice. We are not a parasite, the government has got an option as to whether they wish to extend those contracts or not, he said, adding it would not bid for new business. Patterson also repeatedly refused to say how much Fujitsu would contribute to the £1.8bn redress scheme for victims of the scandal, currently funded by taxpayers. More than 900 sub-postmasters were prosecuted after the faulty Horizon computer system made it look like money was missing from their branch accounts.


Joint Activity Design Heuristics for Enhancing Human-Machine Collaboration

Jalaeian, Mohammadreza, Morey, Dane A., Rayo, Michael F.

arXiv.org Artificial Intelligence

-- Joint activity describes when more than one agent (human or machine) contributes to the completion of a task or activity. Designing for joint activity focuses on explicitly supporting the interdependencies between agents necessary for effective coordination amon g agents engaged in the joint activity. This builds and expands upon designing for usability to further address how technologies can be designed to act as effective team players. Effective joint activity requires supporting, at minimum, five primary macroc ognitive functions within teams: Event Detection, Sensemaking, Adaptability, Perspective - Shifting, and Coordination. Supporting these functions is equally as important as making technologies usable. We synthesized fourteen heuristics from relevant literatu re including display design, human factors, cognitive systems engineering, cognitive psychology, and computer science to aid the design, development, and evaluation of technologies that support joint human - machine activity . Recent advances in Artificial Intelligence (AI) and Machine Learning (ML) technologies have accelerated human - machine interactions progress ing from simple tool - based engagements to complex cognitive collaborations [1] . Machines are being designed to perform an increasing set of functions and are being expected to engage more deeply in the collaborative joint activit ies related to these functions. This shift in machine capabilities and expectations demands a corresponding re - evaluation and broadening of design and evaluation principles to support joint human - machine activity in ways that lie outside the boundaries of trad itional usability methods and models [2] . Traditional usability heuristics, such as those proposed by [3], provide a strong foundation focusing primarily on surface - level interactions such as enhancing the ease of use, efficiency, and satisfaction in human - machine interaction . These heuristics are primarily oriented towards actions and responses but offer limited support for the essential macrocognitive functions associated with effective teamwork including event detection, sensemaking, adaptability, perspective shifting, and co ordination, all of which are vital in the close collaboration of humans and machine s with joint activities [2], [4], [5], [6] . These heuristics are primarily oriented towards actions and responses but offer limited support for the essential macrocognitive functions associated with effective teamwork including event detection, sensemaking, adaptability, perspective shifting, and co ordination . A ll of these macrocognitive functions are vital in the close collaboration of humans and machines with joint activities in high - stakes and dynamic environments with little room for error [2], [5] . This reliance on macrocognitive functions is evident in domains where the ability to process complex information and adapt to changing conditions is crucial.


Breaking the Euclidean Barrier: Hyperboloid-Based Biological Sequence Analysis

Ali, Sarwan, Mansoor, Haris, Patterson, Murray

arXiv.org Artificial Intelligence

Genomic sequence analysis plays a crucial role in various scientific and medical domains. Traditional machine-learning approaches often struggle to capture the complex relationships and hierarchical structures of sequence data when working in high-dimensional Euclidean spaces. This limitation hinders accurate sequence classification and similarity measurement. To address these challenges, this research proposes a method to transform the feature representation of biological sequences into the hyperboloid space. By applying a transformation, the sequences are mapped onto the hyperboloid, preserving their inherent structural information. Once the sequences are represented in the hyperboloid space, a kernel matrix is computed based on the hyperboloid features. The kernel matrix captures the pairwise similarities between sequences, enabling more effective analysis of biological sequence relationships. This approach leverages the inner product of the hyperboloid feature vectors to measure the similarity between pairs of sequences. The experimental evaluation of the proposed approach demonstrates its efficacy in capturing important sequence correlations and improving classification accuracy.


EPIC: Enhancing Privacy through Iterative Collaboration

Chourasia, Prakash, Lonkar, Heramb, Ali, Sarwan, Patterson, Murray

arXiv.org Artificial Intelligence

Advancements in genomics technology lead to a rising volume of viral (e.g., SARS-CoV-2) sequence data, resulting in increased usage of machine learning (ML) in bioinformatics. Traditional ML techniques require centralized data collection and processing, posing challenges in realistic healthcare scenarios. Additionally, privacy, ownership, and stringent regulation issues exist when pooling medical data into centralized storage to train a powerful deep learning (DL) model. The Federated learning (FL) approach overcomes such issues by setting up a central aggregator server and a shared global model. It also facilitates data privacy by extracting knowledge while keeping the actual data private. This work proposes a cutting-edge Privacy enhancement through Iterative Collaboration (EPIC) architecture. The network is divided and distributed between local and centralized servers. We demonstrate the EPIC approach to resolve a supervised classification problem to estimate SARS-CoV-2 genomic sequence data lineage without explicitly transferring raw sequence data. We aim to create a universal decentralized optimization framework that allows various data holders to work together and converge to a single predictive model. The findings demonstrate that privacy-preserving strategies can be successfully used with aggregation approaches without materially altering the degree of learning convergence. Finally, we highlight a few potential issues and prospects for study in FL-based approaches to healthcare applications.


DWFL: Enhancing Federated Learning through Dynamic Weighted Averaging

Chourasia, Prakash, Ali, Tamkanat E, Ali, Sarwan, Pattersn, Murray

arXiv.org Artificial Intelligence

Federated Learning (FL) is a distributed learning technique that maintains data privacy by providing a decentralized training method for machine learning models using distributed big data. This promising Federated Learning approach has also gained popularity in bioinformatics, where the privacy of biomedical data holds immense importance, especially when patient data is involved. Despite the successful implementation of Federated learning in biological sequence analysis, rigorous consideration is still required to improve accuracy in a way that data privacy should not be compromised. Additionally, the optimal integration of federated learning, especially in protein sequence analysis, has not been fully explored. We propose a deep feed-forward neural network-based enhanced federated learning method for protein sequence classification to overcome these challenges. Our method introduces novel enhancements to improve classification accuracy. We introduce dynamic weighted federated learning (DWFL) which is a federated learning-based approach, where local model weights are adjusted using weighted averaging based on their performance metrics. By assigning higher weights to well-performing models, we aim to create a more potent initial global model for the federated learning process, leading to improved accuracy. We conduct experiments using real-world protein sequence datasets to assess the effectiveness of DWFL. The results obtained using our proposed approach demonstrate significant improvements in model accuracy, making federated learning a preferred, more robust, and privacy-preserving approach for collaborative machine-learning tasks.


Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning

Chourasia, Prakash, Murad, Taslim, Tayebi, Zahra, Ali, Sarwan, Khan, Imdad Ullah, Patterson, Murray

arXiv.org Artificial Intelligence

This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike sequences in a distributed way, without data sharing, to detect different variants of this rapidly mutating coronavirus. Our method maintains the confidentiality of local data (that could be stored in different locations) yet allows us to reliably detect and identify different known and unknown variants of the novel coronavirus SARS-CoV-2. Using the proposed approach, we achieve an overall accuracy of $93\%$ on the coronavirus variant identification task. We also provide details regarding how the proposed model follows the main laws of federated learning, such as Laws of data ownership, data privacy, model aggregation, and model heterogeneity. Since the proposed model is distributed, it could scale on ``Big Data'' easily. We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.


Towards Automatic Design of Factorio Blueprints

Patterson, Sean, Espasa, Joan, Chang, Mun See, Hoffmann, Ruth

arXiv.org Artificial Intelligence

Factorio is a 2D construction and management simulation video game about building automated factories to produce items of increasing complexity. A core feature of the game is its blueprint system, which allows players to easily save and replicate parts of their designs. Blueprints can reproduce any layout of objects in the game, but are typically used to encapsulate a complex behaviour, such as the production of a non-basic object. Once created, these blueprints are then used as basic building blocks, allowing the player to create a layer of abstraction. The usage of blueprints not only eases the expansion of the factory but also allows the sharing of designs with the game's community. The layout in a blueprint can be optimised using various criteria, such as the total space used or the final production throughput. The design of an optimal blueprint is a hard combinatorial problem, interleaving elements of many well-studied problems such as bin-packing, routing or network design. This work presents a new challenging problem and explores the feasibility of a constraint model to optimise Factorio blueprints, balancing correctness, optimality, and performance.


Virus2Vec: Viral Sequence Classification Using Machine Learning

Ali, Sarwan, Bello, Babatunde, Chourasia, Prakash, Punathil, Ria Thazhe, Chen, Pin-Yu, Khan, Imdad Ullah, Patterson, Murray

arXiv.org Artificial Intelligence

Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein is the point of contact between the virus and the host cell membrane. On the other hand, the two traditional mammalian orders, Carnivora (carnivores) and Chiroptera (bats) are recognized to be responsible for maintaining and spreading the Rabies Lyssavirus (RABV). We propose Virus2Vec, a feature-vector representation for viral (nucleotide or amino acid) sequences that enable vector-space-based machine learning models to identify viral hosts. Virus2Vec generates numerical feature vectors for unaligned sequences, allowing us to forego the computationally expensive sequence alignment step from the pipeline. Virus2Vec leverages the power of both the \emph{minimizer} and position weight matrix (PWM) to generate compact feature vectors. Using several classifiers, we empirically evaluate Virus2Vec on real-world spike sequences of Coronaviridae and rabies virus sequence data to predict the host (identifying the reservoirs of infection). Our results demonstrate that Virus2Vec outperforms the predictive accuracies of baseline and state-of-the-art methods.


ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

Ali, Sarwan, Chourasia, Prakash, Tayebi, Zahra, Bello, Babatunde, Patterson, Murray

arXiv.org Artificial Intelligence

The amount of sequencing data for SARS-CoV-2 is several orders of magnitude larger than any virus. This will continue to grow geometrically for SARS-CoV-2, and other viruses, as many countries heavily finance genomic surveillance efforts. Hence, we need methods for processing large amounts of sequence data to allow for effective yet timely decision-making. Such data will come from heterogeneous sources: aligned, unaligned, or even unassembled raw nucleotide or amino acid sequencing reads pertaining to the whole genome or regions (e.g., spike) of interest. In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis. Such generation is based on \emph{minimizers}, a type of lightweight "signature" of a sequence, used traditionally in assembly and read mapping -- to our knowledge, the first use minimizers in this way. We validate our approach on different types of sequencing data: (a) 2.5M SARS-CoV-2 spike sequences (to show scalability); (b) 3K Coronaviridae spike sequences (to show robustness to more genomic variability); and (c) 4K raw WGS reads sets taken from nasal-swab PCR tests (to show the ability to process unassembled reads). Our results show that ViralVectors outperforms current benchmarks in most classification and clustering tasks.