Goto

Collaborating Authors

 Performance Analysis


Detecting sudden and gradual drifts in business processes from execution traces

arXiv.org Artificial Intelligence

Business processes are prone to unexpected changes, as process workers may suddenly or gradually start executing a process differently in order to adjust to changes in workload, season, or other external factors. Early detection of business process changes enables managers to identify and act upon changes that may otherwise affect process performance. Business process drift detection refers to a family of methods to detect changes in a business process by analyzing event logs extracted from the systems that support the execution of the process. Existing methods for business process drift detection are based on an explorative analysis of a potentially large feature space and in some cases they require users to manually identify specific features that characterize the drift. Depending on the explored feature space, these methods miss various types of changes. Moreover, they are either designed to detect sudden drifts or gradual drifts but not both. This paper proposes an automated and statistically grounded method for detecting sudden and gradual business process drifts under a unified framework. An empirical evaluation shows that the method detects typical change patterns with significantly higher accuracy and lower detection delay than existing methods, while accurately distinguishing between sudden and gradual drifts.


Artificial Intelligence in Cardiology: Present and Future

#artificialintelligence

For the purpose of this narrative review, we searched PubMed and MEDLINE databases with no date restriction using search terms related to AI and medicine and cardiology subspecialties. Articles were reviewed and selected for inclusion on the basis of relevance. This article highlights that the role of ML in cardiovascular medicine is rapidly emerging, and mounting evidence indicates it will power the new tools that drive the field. Among other uses, AI has been deployed to interpret echocardiograms, to automatically identify heart rhythms from an ECG, to uniquely identify an individual using the ECG as a biometric signal, and to detect the presence of heart disease such as left ventricular dysfunction from the surface ECG.6x6Attia, Z.I., Kapa, S., Lopez-Jimenez, F. et al.


Group Heterogeneity Assessment for Multilevel Models

arXiv.org Machine Learning

Many data sets contain an inherent multilevel structure, for example, because of repeated measurements of the same observational units. Taking this structure into account is critical for the accuracy and calibration of any statistical analysis performed on such data. However, the large number of possible model configurations hinders the use of multilevel models in practice. In this work, we propose a flexible framework for efficiently assessing differences between the levels of given grouping variables in the data. The assessed group heterogeneity is valuable in choosing the relevant group coefficients to consider in a multilevel model. Our empirical evaluations demonstrate that the framework can reliably identify relevant multilevel components in both simulated and real data sets.


Fractional ridge regression: a fast, interpretable reparameterization of ridge regression

arXiv.org Machine Learning

Ridge regression (RR) is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using RR is the need to set a hyperparameter ($\alpha$) that controls the amount of regularization. Cross-validation is typically used to select the best $\alpha$ from a set of candidates. However, efficient and appropriate selection of $\alpha$ can be challenging, particularly where large amounts of data are analyzed. Because the selected $\alpha$ depends on the scale of the data and predictors, it is not straightforwardly interpretable. Here, we propose to reparameterize RR in terms of the ratio $\gamma$ between the L2-norms of the regularized and unregularized coefficients. This approach, called fractional RR (FRR), has several benefits: the solutions obtained for different $\gamma$ are guaranteed to vary, guarding against wasted calculations, and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. We provide an algorithm to solve FRR, as well as open-source software implementations in Python and MATLAB (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems, and delivers results that are straightforward to interpret and compare across models and datasets.


Joint Multi-Dimensional Model for Global and Time-Series Annotations

arXiv.org Machine Learning

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Further, annotations for constructs such as affect are often multi-dimensional with annotators rating multiple dimensions, such as valence and arousal, for each instance. Most annotation fusion schemes however ignore this aspect and model each dimension separately. In this work we address this by proposing a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates. The model we propose is applicable to both global and time series annotation fusion problems and treats the ground truth as a latent variable distorted by the annotators. The model parameters are estimated using the Expectation-Maximization algorithm and we evaluate its performance using synthetic data and real emotion corpora as well as on an artificial task with human annotations


Ensuring Fairness under Prior Probability Shifts

arXiv.org Artificial Intelligence

In this paper, we study the problem of fair classification in the presence of prior probability shifts, where the training set distribution differs from the test set. This phenomenon can be observed in the yearly records of several real-world datasets, such as recidivism records and medical expenditure surveys. If unaccounted for, such shifts can cause the predictions of a classifier to become unfair towards specific population subgroups. While the fairness notion called Proportional Equality (PE) accounts for such shifts, a procedure to ensure PE-fairness was unknown. In this work, we propose a method, called CAPE, which provides a comprehensive solution to the aforementioned problem. CAPE makes novel use of prevalence estimation techniques, sampling and an ensemble of classifiers to ensure fair predictions under prior probability shifts. We introduce a metric, called prevalence difference (PD), which CAPE attempts to minimize in order to ensure PE-fairness. We theoretically establish that this metric exhibits several desirable properties. We evaluate the efficacy of CAPE via a thorough empirical evaluation on synthetic datasets. We also compare the performance of CAPE with several popular fair classifiers on real-world datasets like COMPAS (criminal risk assessment) and MEPS (medical expenditure panel survey). The results indicate that CAPE ensures PE-fair predictions, while performing well on other performance metrics.


The FDA Tightens the Rules for Covid-19 Antibody Blood Tests

WIRED

The federal government has received plenty of well-deserved flack for slow-rolling the national launch of diagnostic tests for Covid-19. First came the flawed swab-based tests from the Centers for Disease Control and Prevention, followed by a chaotic, lost month of regulatory tango that prevented independent tests from getting scaled and out the door. So when interest arose in a different kind of testing--antibody blood tests, which are used to find evidence of past infection, not a current diagnosis--the US Food and Drug Administration was under pressure to hurry things along. In mid-March, the agency loosened its rules, declaring via an update to its emergency use guidance that antibody tests could be sold without seeking the agency's approval, provided that manufacturers did their own validation. Now FDA officials are walking back that decision.


When Machine Unlearning Jeopardizes Privacy

arXiv.org Machine Learning

The right to be forgotten states that a data owner has the right to erase her data from an entity storing it. In the context of machine learning (ML), the right to be forgotten requires an ML model owner to remove the data owner's data from the training set used to build the ML model, a process known as machine unlearning. While originally designed to protect the privacy of the data owner, we argue that machine unlearning may leave some imprint of the data in the ML model and thus create unintended privacy risks. In this paper, we perform the first study on investigating the unintended information leakage caused by machine unlearning. We propose a novel membership inference attack which leverages the different outputs of an ML model's two versions to infer whether the deleted sample is part of the training set. Our experiments over five different datasets demonstrate that the proposed membership inference attack achieves strong performance. More importantly, we show that our attack in multiple cases outperforms the classical membership inference attack on the original ML model, which indicates that machine unlearning can have counterproductive effects on privacy. We notice that the privacy degradation is especially significant for well-generalized ML models where classical membership inference does not perform well. We further investigate two mechanisms to mitigate the newly discovered privacy risks and show that the only effective mechanism is to release the predicted label only. We believe that our results can help improve privacy in practical implementation of machine unlearning.


How to find a unicorn: a novel model-free, unsupervised anomaly detection method for time series

arXiv.org Machine Learning

Recognition of anomalous events is a challenging but critical task in many scientific and industrial fields, especially when the properties of anomalies are unknown. In this paper, we present a new anomaly concept called "unicorn" or unique event and present a new, model-independent, unsupervised detection algorithm to detect unicorns. The Temporal Outlier Factor (TOF) is introduced to measure the uniqueness of events in continuous data sets from dynamic systems. The concept of unique events differs significantly from traditional outliers in many aspects: while repetitive outliers are no longer unique events, a unique event is not necessarily outlier in either pointwise or collective sense; it does not necessarily fall out from the distribution of normal activity. The performance of our algorithm was examined in recognizing unique events on different types of simulated data sets with anomalies and it was compared with the standard Local Outlier Factor (LOF). TOF had superior performance compared to LOF even in recognizing traditional outliers and it also recognized unique events that LOF did not. Benefits of the unicorn concept and the new detection method were illustrated by example data sets from very different scientific fields. Our algorithm successfully recognized unique events in those cases where they were already known such as the gravitational waves of a black hole merger on LIGO detector data and the signs of respiratory failure on ECG data series. Furthermore, unique events were found on the LIBOR data set of the last 30 years.


Can antibody tests tell if you're immune to COVID-19?

FOX News

As the new coronavirus burns its way across the world, scientists are rushing to find ways to identify those who have been infected -- including those who have recovered from COVID-19. Those people, the thinking goes, may be immune to the deadly virus and could theoretically help restart the economy without fear of reinfection. One key piece of this puzzle is rolling out what are known as serological tests that look for specific antibodies in a person's blood. So far, they have been used to estimate how much of the population has been exposed in different areas, such as New York City and Los Angeles. But what are these tests, and can they really help to identify who is immune to SARS-CoV-2? From how they work to what they tell us, here's everything you need to know about coronavirus antibody testing.