van der Aalst, Wil
Process Variant Analysis Across Continuous Features: A Novel Framework
Norouzifar, Ali, Rafiei, Majid, Dees, Marcus, van der Aalst, Wil
Extracted event data from information systems often contain a variety of process executions making the data complex and difficult to comprehend. Unlike current research which only identifies the variability over time, we focus on other dimensions that may play a role in the performance of the process. This research addresses the challenge of effectively segmenting cases within operational processes based on continuous features, such as duration of cases, and evaluated risk score of cases, which are often overlooked in traditional process analysis. We present a novel approach employing a sliding window technique combined with the earth mover's distance to detect changes in control flow behavior over continuous dimensions. This approach enables case segmentation, hierarchical merging of similar segments, and pairwise comparison of them, providing a comprehensive perspective on process behavior. We validate our methodology through a real-life case study in collaboration with UWV, the Dutch employee insurance agency, demonstrating its practical applicability. This research contributes to the field by aiding organizations in improving process efficiency, pinpointing abnormal behaviors, and providing valuable inputs for process comparison, and outcome prediction.
People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection
Sen, Indira, Assenmacher, Dennis, Samory, Mattia, Augenstein, Isabelle, van der Aalst, Wil, Wagner, Claudia
NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.
Tailoring Machine Learning for Process Mining
Ceravolo, Paolo, Junior, Sylvio Barbon, Damiani, Ernesto, van der Aalst, Wil
Process Mining (PM) is a consolidated discipline grounded on data mining and business process management. The exploitation of traditional PM tasks (discovery, conformance checking, and enhancement) is today a reality in many organizations [1, 2]. In the last decade, a wave of new results in artificial intelligence has triggered the interest of the PM research community in using supervised or unsupervised Machine Learning (ML) techniques for gaining insight into business processes and providing advice on how to improve their inefficiencies. In today's practice, ML models are routinely integrated into PM data pipelines [3] to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. For example, ML is playing a key role in the interface between PM and sensor platforms. Advances in sensing technologies have made it possible to deploy distributed monitoring platforms capable of detecting fine-grained events. The granularity gap between these events and the activities considered by classic PM analysis has often been bridged using ML models [4, 5] that compute virtual activity logs, a problem which is also known as log lifting [6].
A Combined Approach of Process Mining and Rule-based AI for Study Planning and Monitoring in Higher Education
Wagner, Miriam, Helal, Hayyan, Roepke, Rene, Judel, Sven, Doveren, Jens, Goerzen, Sergej, Soudmand, Pouya, Lakemeyer, Gerhard, Schroeder, Ulrik, van der Aalst, Wil
This paper presents an approach of using methods of process mining and rule-based artificial intelligence to analyze and understand study paths of students based on campus management system data and study program models. Process mining techniques are used to characterize successful study paths, as well as to detect and visualize deviations from expected plans. These insights are combined with recommendations and requirements of the corresponding study programs extracted from examination regulations. Here, event calculus and answer set programming are used to provide models of the study programs which support planning and conformance checking while providing feedback on possible study plan violations. In its combination, process mining and rule-based artificial intelligence are used to support study planning and monitoring by deriving rules and recommendations for guiding students to more suitable study paths with higher success rates. Two applications will be implemented, one for students and one for study program designers.
Feature Recommendation for Structural Equation Model Discovery in Process Mining
Qafari, Mahnaz Sadat, van der Aalst, Wil
Process mining techniques can help organizations to improve their operational processes. Organizations can benefit from process mining techniques in finding and amending the root causes of performance or compliance problems. Considering the volume of the data and the number of features captured by the information system of today's companies, the task of discovering the set of features that should be considered in root cause analysis can be quite involving. In this paper, we propose a method for finding the set of (aggregated) features with a possible effect on the problem. The root cause analysis task is usually done by applying a machine learning technique to the data gathered from the information system supporting the processes. To prevent mixing up correlation and causation, which may happen because of interpreting the findings of machine learning techniques as causal, we propose a method for discovering the structural equation model of the process that can be used for root cause analysis. We have implemented the proposed method as a plugin in ProM and we have evaluated it using two real and synthetic event logs. These experiments show the validity and effectiveness of the proposed methods.
Case Level Counterfactual Reasoning in Process Mining
Qafari, Mahnaz Sadat, van der Aalst, Wil
Process mining is widely used to diagnose processes and uncover performance and compliance problems. It is also possible to see relations between different behavioral aspects, e.g., cases that deviate more at the beginning of the process tend to get delayed in the last part of the process. However, correlations do not necessarily reveal causalities. Moreover, standard process mining diagnostics do not indicate how to improve the process. This is the reason we advocate the use of \emph{structural equation models} and \emph{counterfactual reasoning}. We use results from causal inference and adapt these to be able to reason over event logs and process interventions. We have implemented the approach as a ProM plug-in and have evaluated it on several data sets. Our ProM plug-in produces recommendations that indicate how specific cases could have been handled differently to avoid a performance or compliance problem.
Fairness-Aware Process Mining
Qafari, Mahnaz Sadat, van der Aalst, Wil
Process mining is a multi-purpose tool enabling organizations to improve their processes. One of the primary purposes of process mining is finding the root causes of performance or compliance problems in processes. The usual way of doing so is by gathering data from the process event log and other sources and then applying some data mining and machine learning techniques. However, the results of applying such techniques are not always acceptable. In many situations, this approach is prone to making obvious or unfair diagnoses and applying them may result in conclusions that are unsurprising or even discriminating (e.g., blaming overloaded employees for delays). In this paper, we present a solution to this problem by creating a fair classifier for such situations. The undesired effects are removed at the expense of reduction on the accuracy of the resulting classifier. We have implemented this method as a plug-in in ProM. Using the implemented plug-in on two real event logs, we decreased the discrimination caused by the classifier, while losing a small fraction of its accuracy.