patterson
Fast Conformal Prediction using Conditional Interquantile Intervals
Guo, Naixin, Luo, Rui, Zhou, Zhixin
We introduce Conformal Interquantile Regression (CIR), a conformal regression method that efficiently constructs near-minimal prediction intervals with guaranteed coverage. CIR leverages black-box machine learning models to estimate outcome distributions through interquantile ranges, transforming these estimates into compact prediction intervals while achieving approximate conditional coverage. We further propose CIR+ (Conditional Interquantile Regression with More Comparison), which enhances CIR by incorporating a width-based selection rule for interquantile intervals. This refinement yields narrower prediction intervals while maintaining comparable coverage, though at the cost of slightly increased computational time. Both methods address key limitations of existing distributional conformal prediction approaches: they handle skewed distributions more effectively than Con-formalized Quantile Regression, and they achieve substantially higher computational efficiency than Conformal Histogram Regression by eliminating the need for histogram construction. Extensive experiments on synthetic and real-world datasets demonstrate that our methods optimally balance predictive accuracy and computational efficiency compared to existing approaches.
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
Fujitsu 'not a parasite' over Horizon scandal
Fujitsu is not a parasite for continuing to profit from government contracts in the wake of the Post Office Horizon scandal, its boss told MPs. European chief executive Paul Patterson said Fujitsu had been given £500m of contract extensions despite its faulty software being at the centre of the huge miscarriage of justice. We are not a parasite, the government has got an option as to whether they wish to extend those contracts or not, he said, adding it would not bid for new business. Patterson also repeatedly refused to say how much Fujitsu would contribute to the £1.8bn redress scheme for victims of the scandal, currently funded by taxpayers. More than 900 sub-postmasters were prosecuted after the faulty Horizon computer system made it look like money was missing from their branch accounts.
- North America > United States (0.16)
- North America > Central America (0.15)
- Oceania > Australia (0.06)
- (15 more...)
- Government > Regional Government > Europe Government > United Kingdom Government (0.48)
- Leisure & Entertainment > Sports (0.43)
- Government > Post Office (0.40)
Joint Activity Design Heuristics for Enhancing Human-Machine Collaboration
Jalaeian, Mohammadreza, Morey, Dane A., Rayo, Michael F.
-- Joint activity describes when more than one agent (human or machine) contributes to the completion of a task or activity. Designing for joint activity focuses on explicitly supporting the interdependencies between agents necessary for effective coordination amon g agents engaged in the joint activity. This builds and expands upon designing for usability to further address how technologies can be designed to act as effective team players. Effective joint activity requires supporting, at minimum, five primary macroc ognitive functions within teams: Event Detection, Sensemaking, Adaptability, Perspective - Shifting, and Coordination. Supporting these functions is equally as important as making technologies usable. We synthesized fourteen heuristics from relevant literatu re including display design, human factors, cognitive systems engineering, cognitive psychology, and computer science to aid the design, development, and evaluation of technologies that support joint human - machine activity . Recent advances in Artificial Intelligence (AI) and Machine Learning (ML) technologies have accelerated human - machine interactions progress ing from simple tool - based engagements to complex cognitive collaborations [1] . Machines are being designed to perform an increasing set of functions and are being expected to engage more deeply in the collaborative joint activit ies related to these functions. This shift in machine capabilities and expectations demands a corresponding re - evaluation and broadening of design and evaluation principles to support joint human - machine activity in ways that lie outside the boundaries of trad itional usability methods and models [2] . Traditional usability heuristics, such as those proposed by [3], provide a strong foundation focusing primarily on surface - level interactions such as enhancing the ease of use, efficiency, and satisfaction in human - machine interaction . These heuristics are primarily oriented towards actions and responses but offer limited support for the essential macrocognitive functions associated with effective teamwork including event detection, sensemaking, adaptability, perspective shifting, and co ordination, all of which are vital in the close collaboration of humans and machine s with joint activities [2], [4], [5], [6] . These heuristics are primarily oriented towards actions and responses but offer limited support for the essential macrocognitive functions associated with effective teamwork including event detection, sensemaking, adaptability, perspective shifting, and co ordination . A ll of these macrocognitive functions are vital in the close collaboration of humans and machines with joint activities in high - stakes and dynamic environments with little room for error [2], [5] . This reliance on macrocognitive functions is evident in domains where the ability to process complex information and adapt to changing conditions is crucial.
- North America > United States > Ohio (0.04)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (2 more...)
- Transportation > Air (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
- Health & Medicine > Therapeutic Area > Immunology (0.69)
- (2 more...)
Breaking the Euclidean Barrier: Hyperboloid-Based Biological Sequence Analysis
Ali, Sarwan, Mansoor, Haris, Patterson, Murray
Genomic sequence analysis plays a crucial role in various scientific and medical domains. Traditional machine-learning approaches often struggle to capture the complex relationships and hierarchical structures of sequence data when working in high-dimensional Euclidean spaces. This limitation hinders accurate sequence classification and similarity measurement. To address these challenges, this research proposes a method to transform the feature representation of biological sequences into the hyperboloid space. By applying a transformation, the sequences are mapped onto the hyperboloid, preserving their inherent structural information. Once the sequences are represented in the hyperboloid space, a kernel matrix is computed based on the hyperboloid features. The kernel matrix captures the pairwise similarities between sequences, enabling more effective analysis of biological sequence relationships. This approach leverages the inner product of the hyperboloid feature vectors to measure the similarity between pairs of sequences. The experimental evaluation of the proposed approach demonstrates its efficacy in capturing important sequence correlations and improving classification accuracy.
- Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
- North America > United States > Colorado (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
EPIC: Enhancing Privacy through Iterative Collaboration
Chourasia, Prakash, Lonkar, Heramb, Ali, Sarwan, Patterson, Murray
Advancements in genomics technology lead to a rising volume of viral (e.g., SARS-CoV-2) sequence data, resulting in increased usage of machine learning (ML) in bioinformatics. Traditional ML techniques require centralized data collection and processing, posing challenges in realistic healthcare scenarios. Additionally, privacy, ownership, and stringent regulation issues exist when pooling medical data into centralized storage to train a powerful deep learning (DL) model. The Federated learning (FL) approach overcomes such issues by setting up a central aggregator server and a shared global model. It also facilitates data privacy by extracting knowledge while keeping the actual data private. This work proposes a cutting-edge Privacy enhancement through Iterative Collaboration (EPIC) architecture. The network is divided and distributed between local and centralized servers. We demonstrate the EPIC approach to resolve a supervised classification problem to estimate SARS-CoV-2 genomic sequence data lineage without explicitly transferring raw sequence data. We aim to create a universal decentralized optimization framework that allows various data holders to work together and converge to a single predictive model. The findings demonstrate that privacy-preserving strategies can be successfully used with aggregation approaches without materially altering the degree of learning convergence. Finally, we highlight a few potential issues and prospects for study in FL-based approaches to healthcare applications.
- North America > United States > California (0.14)
- Europe > United Kingdom > Scotland (0.05)
- Europe > Sweden (0.05)
- (10 more...)
DWFL: Enhancing Federated Learning through Dynamic Weighted Averaging
Chourasia, Prakash, Ali, Tamkanat E, Ali, Sarwan, Pattersn, Murray
Federated Learning (FL) is a distributed learning technique that maintains data privacy by providing a decentralized training method for machine learning models using distributed big data. This promising Federated Learning approach has also gained popularity in bioinformatics, where the privacy of biomedical data holds immense importance, especially when patient data is involved. Despite the successful implementation of Federated learning in biological sequence analysis, rigorous consideration is still required to improve accuracy in a way that data privacy should not be compromised. Additionally, the optimal integration of federated learning, especially in protein sequence analysis, has not been fully explored. We propose a deep feed-forward neural network-based enhanced federated learning method for protein sequence classification to overcome these challenges. Our method introduces novel enhancements to improve classification accuracy. We introduce dynamic weighted federated learning (DWFL) which is a federated learning-based approach, where local model weights are adjusted using weighted averaging based on their performance metrics. By assigning higher weights to well-performing models, we aim to create a more potent initial global model for the federated learning process, leading to improved accuracy. We conduct experiments using real-world protein sequence datasets to assess the effectiveness of DWFL. The results obtained using our proposed approach demonstrate significant improvements in model accuracy, making federated learning a preferred, more robust, and privacy-preserving approach for collaborative machine-learning tasks.
- Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning
Chourasia, Prakash, Murad, Taslim, Tayebi, Zahra, Ali, Sarwan, Khan, Imdad Ullah, Patterson, Murray
This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike sequences in a distributed way, without data sharing, to detect different variants of this rapidly mutating coronavirus. Our method maintains the confidentiality of local data (that could be stored in different locations) yet allows us to reliably detect and identify different known and unknown variants of the novel coronavirus SARS-CoV-2. Using the proposed approach, we achieve an overall accuracy of $93\%$ on the coronavirus variant identification task. We also provide details regarding how the proposed model follows the main laws of federated learning, such as Laws of data ownership, data privacy, model aggregation, and model heterogeneity. Since the proposed model is distributed, it could scale on ``Big Data'' easily. We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.
- North America > United States > California (0.14)
- North America > United States > New York (0.04)
- Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
- (9 more...)
Towards Automatic Design of Factorio Blueprints
Patterson, Sean, Espasa, Joan, Chang, Mun See, Hoffmann, Ruth
Factorio is a 2D construction and management simulation video game about building automated factories to produce items of increasing complexity. A core feature of the game is its blueprint system, which allows players to easily save and replicate parts of their designs. Blueprints can reproduce any layout of objects in the game, but are typically used to encapsulate a complex behaviour, such as the production of a non-basic object. Once created, these blueprints are then used as basic building blocks, allowing the player to create a layer of abstraction. The usage of blueprints not only eases the expansion of the factory but also allows the sharing of designs with the game's community. The layout in a blueprint can be optimised using various criteria, such as the total space used or the final production throughput. The design of an optimal blueprint is a hard combinatorial problem, interleaving elements of many well-studied problems such as bin-packing, routing or network design. This work presents a new challenging problem and explores the feasibility of a constraint model to optimise Factorio blueprints, balancing correctness, optimality, and performance.
- Transportation (0.68)
- Leisure & Entertainment > Games > Computer Games (0.34)
Virus2Vec: Viral Sequence Classification Using Machine Learning
Ali, Sarwan, Bello, Babatunde, Chourasia, Prakash, Punathil, Ria Thazhe, Chen, Pin-Yu, Khan, Imdad Ullah, Patterson, Murray
Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein is the point of contact between the virus and the host cell membrane. On the other hand, the two traditional mammalian orders, Carnivora (carnivores) and Chiroptera (bats) are recognized to be responsible for maintaining and spreading the Rabies Lyssavirus (RABV). We propose Virus2Vec, a feature-vector representation for viral (nucleotide or amino acid) sequences that enable vector-space-based machine learning models to identify viral hosts. Virus2Vec generates numerical feature vectors for unaligned sequences, allowing us to forego the computationally expensive sequence alignment step from the pipeline. Virus2Vec leverages the power of both the \emph{minimizer} and position weight matrix (PWM) to generate compact feature vectors. Using several classifiers, we empirically evaluate Virus2Vec on real-world spike sequences of Coronaviridae and rabies virus sequence data to predict the host (identifying the reservoirs of infection). Our results demonstrate that Virus2Vec outperforms the predictive accuracies of baseline and state-of-the-art methods.
- North America > United States (0.05)
- Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
- South America > Brazil (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.95)
ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation
Ali, Sarwan, Chourasia, Prakash, Tayebi, Zahra, Bello, Babatunde, Patterson, Murray
The amount of sequencing data for SARS-CoV-2 is several orders of magnitude larger than any virus. This will continue to grow geometrically for SARS-CoV-2, and other viruses, as many countries heavily finance genomic surveillance efforts. Hence, we need methods for processing large amounts of sequence data to allow for effective yet timely decision-making. Such data will come from heterogeneous sources: aligned, unaligned, or even unassembled raw nucleotide or amino acid sequencing reads pertaining to the whole genome or regions (e.g., spike) of interest. In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis. Such generation is based on \emph{minimizers}, a type of lightweight "signature" of a sequence, used traditionally in assembly and read mapping -- to our knowledge, the first use minimizers in this way. We validate our approach on different types of sequencing data: (a) 2.5M SARS-CoV-2 spike sequences (to show scalability); (b) 3K Coronaviridae spike sequences (to show robustness to more genomic variability); and (c) 4K raw WGS reads sets taken from nasal-swab PCR tests (to show the ability to process unassembled reads). Our results show that ViralVectors outperforms current benchmarks in most classification and clustering tasks.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)