Goto

Collaborating Authors

 Accuracy


Detection of sepsis during emergency department triage using machine learning

arXiv.org Artificial Intelligence

Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Even a few hours of delay in the treatment of sepsis results in increased mortality. Early detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to compare sepsis detection performance at ED triage (prior to the use of laboratory diagnostics) of the standard sepsis screening algorithm (SIRS with source of infection) and a machine learning algorithm trained on EHR triage data. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16participating hospitals. KATE Sepsis and standard screening were retrospectively evaluated on the adult population of 512,949 medical records. KATE Sepsis demonstrates an AUC of 0.9423 (0.9401 - 0.9441) with sensitivity of 71.09% (70.12% - 71.98%) and specificity of 94.81% (94.75% - 94.87%). Standard screening demonstrates an AUC of 0.6826 (0.6774 - 0.6878) with sensitivity of 40.8% (39.71% - 41.86%) and specificity of 95.72% (95.68% - 95.78%). The KATE Sepsis model trained to detect sepsis demonstrates 77.67% (75.78% -79.42%) sensitivity in detecting severe sepsis and 86.95% (84.2% - 88.81%) sensitivity in detecting septic shock. The standard screening protocol demonstrates 43.06% (41% - 45.87%) sensitivity in detecting severe sepsis and40% (36.55% - 43.26%) sensitivity in detecting septic shock. Future research should focus on the prospective impact of KATE Sepsis on administration of antibiotics, readmission rate, morbidity and mortality.


Distributionally Robust Data Join

arXiv.org Artificial Intelligence

Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is $r_1$ away from the empirical distribution over the labeled dataset and $r_2$ away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features.


Investigating Membership Inference Attacks under Data Dependencies

arXiv.org Artificial Intelligence

Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $\epsilon$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.


Automatic and Accurate Classification of Hotel Bathrooms from Images with Deep Learning

arXiv.org Artificial Intelligence

Hotel bathrooms are one of the most important places in terms of customer satisfaction, and where the most complaints are reported. To share their experiences, guests rate hotels, comment, and share images of their positive or negative ratings. An important part of the room images shared by guests is related to bathrooms. Guests tend to prove their satisfaction or dissatisfaction with the bathrooms with images in their comments. These Positive or negative comments and visuals potentially affect the prospective guests. In this study, two different versions of a deep learning algorithm were designed to classify hotel bathrooms as satisfactory (good) or unsatisfactory (bad, when any defects such as dirtiness, deficiencies, malfunctions were present) by analyzing images. The best-performer between the two models was determined as a result of a series of extensive experimental studies. The models were trained for each of 144 combinations of 5 hyper-parameter sets with a data set containing more than 11 thousand bathroom images, specially created for this study. The "HotelBath" data set was shared also with the community with this study. Four different image sizes were taken into consideration: 128, 256, 512 and 1024 pixels in both directions. The classification performances of the models were measured with several metrics. Both algorithms showed very attractive performances even with many combinations of hyper-parameters. They can classify bathroom images with very high accuracy. Suh that the top algorithm achieved an accuracy of 92.4% and an AUC (area under the curve) score of 0.967. In addition, other metrics also proved the success...


Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level

arXiv.org Artificial Intelligence

The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.


Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

arXiv.org Artificial Intelligence

As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained language models (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclear if they can handle inputs that have been distributionally shifted effectively. In this study, we systematically explore how the ability to detect out-of-distribution (OOD) changes as the size of the PLM grows or the transfer methods are altered. Specifically, we evaluated various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks, each utilizing various language models with different scales.


Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

arXiv.org Artificial Intelligence

Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs, as crowd workers have financial incentives to use LLMs to increase their productivity and income. To investigate this concern, we conducted a case study on the prevalence of LLM usage by crowd workers. We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33-46% of crowd workers used LLMs when completing the task. Although generalization to other, less LLM-friendly tasks is unclear, our results call for platforms, researchers, and crowd workers to find new ways to ensure that human data remain human, perhaps using the methodology proposed here as a stepping stone. Code/data: https://github.com/epfl-dlab/GPTurk


Differential Privacy with Random Projections and Sign Random Projections

arXiv.org Artificial Intelligence

In this paper, we develop a series of differential privacy (DP) algorithms from a family of random projections (RP) for general applications in machine learning, data mining, and information retrieval. Among the presented algorithms, iDP-SignRP is remarkably effective under the setting of ``individual differential privacy'' (iDP), based on sign random projections (SignRP). Also, DP-SignOPORP considerably improves existing algorithms in the literature under the standard DP setting, using ``one permutation + one random projection'' (OPORP), where OPORP is a variant of the celebrated count-sketch method with fixed-length binning and normalization. Without taking signs, among the DP-RP family, DP-OPORP achieves the best performance. Our key idea for improving DP-RP is to take only the signs, i.e., $sign(x_j) = sign\left(\sum_{i=1}^p u_i w_{ij}\right)$, of the projected data. The intuition is that the signs often remain unchanged when the original data ($u$) exhibit small changes (according to the ``neighbor'' definition in DP). In other words, the aggregation and quantization operations themselves provide good privacy protections. We develop a technique called ``smooth flipping probability'' that incorporates this intuitive privacy benefit of SignRPs and improves the standard DP bit flipping strategy. Based on this technique, we propose DP-SignOPORP which satisfies strict DP and outperforms other DP variants based on SignRP (and RP), especially when $\epsilon$ is not very large (e.g., $\epsilon = 5\sim10$). Moreover, if an application scenario accepts individual DP, then we immediately obtain an algorithm named iDP-SignRP which achieves excellent utilities even at small~$\epsilon$ (e.g., $\epsilon<0.5$).


A Hypergraph-Based Machine Learning Ensemble Network Intrusion Detection System

arXiv.org Artificial Intelligence

Network intrusion detection systems (NIDS) to detect malicious attacks continue to meet challenges. NIDS are often developed offline while they face auto-generated port scan infiltration attempts, resulting in a significant time lag from adversarial adaption to NIDS response. To address these challenges, we use hypergraphs focused on internet protocol addresses and destination ports to capture evolving patterns of port scan attacks. The derived set of hypergraph-based metrics are then used to train an ensemble machine learning (ML) based NIDS that allows for real-time adaption in monitoring and detecting port scanning activities, other types of attacks, and adversarial intrusions at high accuracy, precision and recall performances. This ML adapting NIDS was developed through the combination of (1) intrusion examples, (2) NIDS update rules, (3) attack threshold choices to trigger NIDS retraining requests, and (4) a production environment with no prior knowledge of the nature of network traffic. 40 scenarios were auto-generated to evaluate the ML ensemble NIDS comprising three tree-based models. The resulting ML Ensemble NIDS was extended and evaluated with the CIC-IDS2017 dataset. Results show that under the model settings of an Update-ALL-NIDS rule (specifically retrain and update all the three models upon the same NIDS retraining request) the proposed ML ensemble NIDS evolved intelligently and produced the best results with nearly 100% detection performance throughout the simulation.


Survey of Trustworthy AI: A Meta Decision of AI

arXiv.org Artificial Intelligence

When making strategic decisions, we are often confronted with overwhelming information to process. The situation can be further complicated when some pieces of evidence are contradicted each other or paradoxical. The challenge then becomes how to determine which information is useful and which ones should be eliminated. This process is known as meta-decision. Likewise, when it comes to using Artificial Intelligence (AI) systems for strategic decision-making, placing trust in the AI itself becomes a meta-decision, given that many AI systems are viewed as opaque "black boxes" that process large amounts of data. Trusting an opaque system involves deciding on the level of Trustworthy AI (TAI). We propose a new approach to address this issue by introducing a novel taxonomy or framework of TAI, which encompasses three crucial domains: articulate, authentic, and basic for different levels of trust. To underpin these domains, we create ten dimensions to measure trust: explainability/transparency, fairness/diversity, generalizability, privacy, data governance, safety/robustness, accountability, reproducibility, reliability, and sustainability. We aim to use this taxonomy to conduct a comprehensive survey and explore different TAI approaches from a strategic decision-making perspective.