Goto

Collaborating Authors

 event log


Model-driven Stochastic Trace Clustering

Peeperkorn, Jari, De Smedt, Johannes, De Weerdt, Jochen

arXiv.org Artificial Intelligence

Process discovery algorithms automatically extract process models from event logs, but high variability often results in complex and hard-to-understand models. To mitigate this issue, trace clustering techniques group process executions into clusters, each represented by a simpler and more understandable process model. Model-driven trace clustering improves on this by assigning traces to clusters based on their conformity to cluster-specific process models. However, most existing clustering techniques rely on either no process model discovery, or non-stochastic models, neglecting the frequency or probability of activities and transitions, thereby limiting their capability to capture real-world execution dynamics. We propose a novel model-driven trace clustering method that optimizes stochastic process models within each cluster. Our approach uses entropic relevance, a stochastic conformance metric based on directly-follows probabilities, to guide trace assignment. This allows clustering decisions to consider both structural alignment with a cluster's process model and the likelihood that a trace originates from a given stochastic process model. The method is computationally efficient, scales linearly with input size, and improves model interpretability by producing clusters with clearer control-flow patterns. Extensive experiments on public real-life datasets demonstrate that while our method yields superior stochastic coherence and graph simplicity, traditional fitness metrics reveal a trade-off, highlighting the specific utility of our approach for stochastic process analysis.


Time Series Foundation Models for Process Model Forecasting

Yu, Yongbo, Peeperkorn, Jari, De Smedt, Johannes, De Weerdt, Jochen

arXiv.org Artificial Intelligence

Process Model Forecasting (PMF) aims to predict how the control-flow structure of a process evolves over time by mode ling the temporal dynamics of directly-follows (DF) relations, comple menting predictive process monitoring that focuses on single-case prefixe s. Prior benchmarks show that machine learning and deep learning models pr ovide only modest gains over statistical baselines, mainly due to the s parsity and heterogeneity of the DF time series. We investigate Time Ser ies Foundation Models (TSFMs), large pre-trained models for generic t ime series, as an alternative for PMF. Using DF time series derived from rea l-life event logs, we compare zero-shot use of TSFMs, without additional training, with fine-tuned variants adapted on PMF-specific data. TSFMs generally achieve lower forecasting errors (MAE and RMSE) than tradit ional and specialized models trained from scratch on the same logs, in dicating effective transfer of temporal structure from non-process do mains. While fine-tuning can further improve accuracy, the gains are ofte n small and may disappear on smaller or more complex datasets, so zero-s hot use remains a strong default. Our study highlights the generaliza tion capability and data efficiency of TSFMs for process-related time series a nd, to the best of our knowledge, provides the first systematic evaluat ion of temporal foundation models for PMF.


Architecting software monitors for control-flow anomaly detection through large language models and conformance checking

Vitale, Francesco, Flammini, Francesco, Caporuscio, Mauro, Mazzocca, Nicola

arXiv.org Artificial Intelligence

Context: Ensuring high levels of dependability in modern computer-based systems has become increasingly challenging due to their complexity. Although systems are validated at design time, their behavior can be different at run-time, possibly showing control-flow anomalies due to "unknown unknowns". Objective: We aim to detect control-flow anomalies through software monitoring, which verifies run-time behavior by logging software execution and detecting deviations from expected control flow. Methods: We propose a methodology to develop software monitors for control-flow anomaly detection through Large Language Models (LLMs) and conformance checking. The methodology builds on existing software development practices to maintain traditional V&V while providing an additional level of robustness and trustworthiness. It leverages LLMs to link design-time models and implementation code, automating source-code instrumentation. The resulting event logs are analyzed via conformance checking, an explainable and effective technique for control-flow anomaly detection. Results: We test the methodology on a case-study scenario from the European Railway Traffic Management System / European Train Control System (ERTMS/ETCS), which is a railway standard for modern interoperable railways. The results obtained from the ERTMS/ETCS case study demonstrate that LLM-based source-code instrumentation can achieve up to 84.775% control-flow coverage of the reference design-time process model, while the subsequent conformance checking-based anomaly detection reaches a peak performance of 96.610% F1-score and 93.515% AUC. Conclusion: Incorporating domain-specific knowledge to guide LLMs in source-code instrumentation significantly allowed obtaining reliable and quality software logs and enabled effective control-flow anomaly detection through conformance checking.


Discriminative Rule Learning for Outcome-Guided Process Model Discovery

Norouzifar, Ali, van der Aalst, Wil

arXiv.org Artificial Intelligence

Event logs extracted from information systems offer a rich foundation for understanding and improving business processes. In many real-world applications, it is possible to distinguish between desirable and undesirable process executions, where desirable traces reflect efficient or compliant behavior, and undesirable ones may involve inefficiencies, rule violations, delays, or resource waste. This distinction presents an opportunity to guide process discovery in a more outcome-aware manner. Discovering a single process model without considering outcomes can yield representations poorly suited for conformance checking and performance analysis, as they fail to capture critical behavioral differences. Moreover, prioritizing one behavior over the other may obscure structural distinctions vital for understanding process outcomes. By learning interpretable discriminative rules over control-flow features, we group traces with similar desirability profiles and apply process discovery separately within each group. This results in focused and interpretable models that reveal the drivers of both desirable and undesirable executions. The approach is implemented as a publicly available tool and it is evaluated on multiple real-life event logs, demonstrating its effectiveness in isolating and visualizing critical process patterns.


Actor-Enriched Time Series Forecasting of Process Performance

Leribaux, Aurelie, Oyamada, Rafael, De Smedt, Johannes, Bozorgi, Zahra Dasht, Polyvyanyy, Artem, De Weerdt, Jochen

arXiv.org Artificial Intelligence

Predictive Process Monitoring (PPM) is a key task in Process Mining that aims to predict future behavior, outcomes, or performance indicators. Accurate prediction of the latter is critical for proactive decision-making. Given that processes are often resource-driven, understanding and incorporating actor behavior in forecasting is crucial. Although existing research has incorporated aspects of actor behavior, its role as a time-varying signal in PPM remains limited. This study investigates whether incorporating actor behavior information, modeled as time series, can improve the predictive performance of throughput time (TT) forecasting models. Using real-life event logs, we construct multivariate time series that include TT alongside actor-centric features, i.e., actor involvement, the frequency of continuation, interruption, and handover behaviors, and the duration of these behaviors. We train and compare several models to study the benefits of adding actor behavior. The results show that actor-enriched models consistently outperform baseline models, which only include TT features, in terms of RMSE, MAE, and R2. These findings demonstrate that modeling actor behavior over time and incorporating this information into forecasting models enhances performance indicator predictions.


Evaluating LLM-Based Process Explanations under Progressive Behavioral-Input Reduction

van Oerle, P., Bemthuis, R. H., Bukhsh, F. A.

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly used to generate textual explanations of process models discovered from event logs. Producing explanations from large behavioral abstractions (e.g., directly-follows graphs or Petri nets) can be computationally expensive. This paper reports an exploratory evaluation of explanation quality under progressive behavioral-input reduction, where models are discovered from progressively smaller prefixes of a fixed log. Our pipeline (i) discovers models at multiple input sizes, (ii) prompts an LLM to generate explanations, and (iii) uses a second LLM to assess completeness, bottleneck identification, and suggested improvements. On synthetic logs, explanation quality is largely preserved under moderate reduction, indicating a practical cost-quality trade-off. The study is exploratory, as the scores are LLM-based (comparative signals rather than ground truth) and the data are synthetic. The results suggest a path toward more computationally efficient, LLM-assisted process analysis in resource-constrained settings.


Integrating Domain Knowledge into Process Discovery Using Large Language Models

Norouzifar, Ali, Kourani, Humam, Dees, Marcus, van der Aalst, Wil

arXiv.org Artificial Intelligence

Process discovery aims to derive process models from event logs, providing insights into operational behavior and forming a foundation for conformance checking and process improvement. However, models derived solely from event data may not accurately reflect the real process, as event logs are often incomplete or affected by noise, and domain knowledge, an important complementary resource, is typically disregarded. As a result, the discovered models may lack reliability for downstream tasks. We propose an interactive framework that incorporates domain knowledge, expressed in natural language, into the process discovery pipeline using Large Language Models (LLMs). Our approach leverages LLMs to extract declarative rules from textual descriptions provided by domain experts. These rules are used to guide the IMr discovery algorithm, which recursively constructs process models by combining insights from both the event log and the extracted rules, helping to avoid problematic process structures that contradict domain knowledge. The framework coordinates interactions among the LLM, domain experts, and a set of backend services. We present a fully implemented tool that supports this workflow and conduct an extensive evaluation of multiple LLMs and prompt engineering strategies. Our empirical study includes a case study based on a real-life event log with the involvement of domain experts, who assessed the usability and effectiveness of the framework.


From Source to Target: Leveraging Transfer Learning for Predictive Process Monitoring in Organizations

Weinzierl, Sven, Zilker, Sandra, Liessmann, Annina, Käppel, Martin, Wang, Weixin, Matzner, Martin

arXiv.org Artificial Intelligence

Event logs reflect the behavior of business processes that are mapped in organizational information systems. Predictive process monitoring (PPM) transforms these data into value by creating process-related predictions that provide the insights required for proactive interventions at process runtime. Existing PPM techniques require sufficient amounts of event data or other relevant resources that might not be readily available, which prevents some organizations from utilizing PPM. The transfer learning-based PPM technique presented in this paper allows organizations without suitable event data or other relevant resources to implement PPM for effective decision support. This technique is instantiated in both a real-life intra- and an inter-organizational use case, based on which numerical experiments are performed using event logs for IT service management processes. The results of the experiments suggest that knowledge of one business process can be transferred to a similar business process in the same or a different organization to enable effective PPM in the target context. The proposed technique allows organizations to benefit from transfer learning in intra- and inter-organizational settings by transferring resources such as pre-trained models within and across organizational boundaries.


Discovering and Analyzing Stochastic Processes to Reduce Waste in Food Retail

Kalenkova, Anna, Xia, Lu, Neumann, Dirk

arXiv.org Artificial Intelligence

This paper proposes a novel method for analyzing food retail processes with a focus on reducing food waste. The approach integrates object-centric process mining (OCPM) with stochastic process discovery and analysis. First, a stochastic process in the form of a continuous-time Markov chain is discovered from grocery store sales data. This model is then extended with supply activities. Finally, a what-if analysis is conducted to evaluate how the quantity of products in the store evolves over time. This enables the identification of an optimal balance between customer purchasing behavior and supply strategies, helping to prevent both food waste due to oversupply and product shortages.


Remaining Time Prediction in Outbound Warehouse Processes: A Case Study (Short Paper)

Penther, Erik, Grohs, Michael, Rehse, Jana-Rebecca

arXiv.org Artificial Intelligence

Predictive process monitoring is a sub-domain of process mining which aims to forecast the future of ongoing process executions. One common prediction target is the remaining time, meaning the time that will elapse until a process execution is completed. In this paper, we compare four different remaining time prediction approaches in a real-life outbound warehouse process of a logistics company in the aviation business. For this process, the company provided us with a novel and original event log with 169,523 traces, which we can make publicly available. Unsurprisingly, we find that deep learning models achieve the highest accuracy, but shallow methods like conventional boosting techniques achieve competitive accuracy and require significantly fewer computational resources.