Goto

Collaborating Authors

 Cohen, Izack


A Scalable and Near-Optimal Conformance Checking Approach for Long Traces

arXiv.org Artificial Intelligence

Long traces and large event logs that originate from sensors and prediction models are becoming more common in our data-rich world. In such circumstances, conformance checking, a key task in process mining, can become computationally infeasible due to the exponential complexity of finding an optimal alignment. This paper introduces a novel sliding window approach to address these scalability challenges while preserving the interpretability of alignment-based methods. By breaking down traces into manageable subtraces and iteratively aligning each with the process model, our method significantly reduces the search space. The approach uses global information that captures structural properties of the trace and the process model to make informed alignment decisions, discarding unpromising alignments even if they are optimal for a local subtrace. This improves the overall accuracy of the results. Experimental evaluations demonstrate that the proposed method consistently finds optimal alignments in most cases and highlight its scalability. This is further supported by a theoretical complexity analysis, which shows the reduced growth of the search space compared to other common conformance checking methods. This work provides a valuable contribution towards efficient conformance checking for large-scale process mining applications.


Data-driven project planning: An integrated network learning and constraint relaxation approach in favor of scheduling

arXiv.org Artificial Intelligence

Our focus is on projects, i.e., business processes, which are emerging as the economic drivers of our times. Differently from day-to-day operational processes that do not require detailed planning, a project requires planning and resource-constrained scheduling for coordinating resources across sub- or related projects and organizations. A planner in charge of project planning has to select a set of activities to perform, determine their precedence constraints, and schedule them according to temporal project constraints. We suggest a data-driven project planning approach for classes of projects such as infrastructure building and information systems development projects. A project network is first learned from historical records. The discovered network relaxes temporal constraints embedded in individual projects, thus uncovering where planning and scheduling flexibility can be exploited for greater benefit. Then, the network, which contains multiple project plan variations, from which one has to be selected, is enriched by identifying decision rules and frequent paths. The planner can rely on the project network for: 1) decoding a project variation such that it forms a new project plan, and 2) applying resource-constrained project scheduling procedures to determine the project's schedule and resource allocation. Using two real-world project datasets, we show that the suggested approach may provide the planner with significant flexibility (up to a 26% reduction of the critical path of a real project) to adjust the project plan and schedule. We believe that the proposed approach can play an important part in supporting decision making towards automated data-driven project planning.


SKTR: Trace Recovery from Stochastically Known Logs

arXiv.org Artificial Intelligence

Developments in machine learning together with the increasing usage of sensor data challenge the reliance on deterministic logs, requiring new process mining solutions for uncertain, and in particular stochastically known, logs. In this work we formulate {trace recovery}, the task of generating a deterministic log from stochastically known logs that is as faithful to reality as possible. An effective trace recovery algorithm would be a powerful aid for maintaining credible process mining tools for uncertain settings. We propose an algorithmic framework for this task that recovers the best alignment between a stochastically known log and a process model, with three innovative features. Our algorithm, SKTR, 1) handles both Markovian and non-Markovian processes; 2) offers a quality-based balance between a process model and a log, depending on the available process information, sensor quality, and machine learning predictiveness power; and 3) offers a novel use of a synchronous product multigraph to create the log. An empirical analysis using five publicly available datasets, three of which use predictive models over standard video capturing benchmarks, shows an average relative accuracy improvement of more than 10 over a common baseline.


Adaptive Learning for the Resource-Constrained Classification Problem

arXiv.org Artificial Intelligence

Classification applications are typically associated with misclassification costs and benefits as a result of incorrect and correct classification, respectively. Many studies have focused on cost-sensitive classification approaches [7, 8, 9, 10, 11, 12] in an effort to reduce the costs of misclassification. We illustrate the concept of imbalanced misclassification costs using the current and real-world example of classifying COVID-19 patients. Incorrectly classifying an ill patient as healthy may put this patient's life at risk as well as others by allowing the ill person to circulate among healthy persons and infect them (an intangible cost, usually determined by the judicial system). Classifying a healthy individual as a COVID-19 patient, on the other hand, may lead to unnecessary treatment, misuse of medical resources and cause unnecessary financial hardship to the individual and the general economy. Many studies have applied cost-sensitive approaches to handling imbalanced classification problems [13, 14] where the decision maker is interested in detecting the positive cases. There are four main approaches for making a classifier cost-sensitive: (i) changing the distribution of classes using over-and under-sampling within the training data set (i.e., preprocessing of the training data) to reduce misclassification costs [7, 8], denoted hereafter approach A1; (ii) changing the data set according to the misclassified samples of the cost-insensitive classifiers and their error costs (post-processing the training data) using a boosting approach in ensemble learning methods [12, 15], denoted hereafter approach A2; (iii) incorporating meta-learning methods on outputs of cost-insensitive learners using threshold driven techniques in favor of utilizing the probability estimations for the classes [7, 8, 16, 17], hereafter denoted A3; (iv) directly incorporating cost-sensitive capabilities into a learning algorithm, i.e., an algorithm-level solution that adapts existing learning methods so they are biased towards classes with high misclassification costs, usually presented by minority classes [8, 18].


Conformance Checking Over Stochastically Known Logs

arXiv.org Artificial Intelligence

With the growing number of devices, sensors and digital systems, data logs may become uncertain due to, e.g., sensor reading inaccuracies or incorrect interpretation of readings by processing programs. At times, such uncertainties can be captured stochastically, especially when using probabilistic data classification models. In this work we focus on conformance checking, which compares a process model with an event log, when event logs are stochastically known. Building on existing alignment-based conformance checking fundamentals, we mathematically define a stochastic trace model, a stochastic synchronous product, and a cost function that reflects the uncertainty of events in a log. Then, we search for an optimal alignment over the reachability graph of the stochastic synchronous product for finding an optimal alignment between a model and a stochastic process observation. Via structured experiments with two well-known process mining benchmarks, we explore the behavior of the suggested stochastic conformance checking approach and compare it to a standard alignment-based approach as well as to an approach that creates a lower bound on performance. We envision the proposed stochastic conformance checking approach as a viable process mining component for future analysis of stochastic event logs.


Uncertain Process Data with Probabilistic Knowledge: Problem Characterization and Challenges

arXiv.org Artificial Intelligence

Motivated by the abundance of uncertain event data from multiple sources including physical devices and sensors, this paper presents the task of relating a stochastic process observation to a process model that can be rendered from a dataset. In contrast to previous research that suggested to transform a stochastically known event log into a less informative uncertain log with upper and lower bounds on activity frequencies, we consider the challenge of accommodating the probabilistic knowledge into conformance checking techniques. Based on a taxonomy that captures the spectrum of conformance checking cases under stochastic process observations, we present three types of challenging cases. The first includes conformance checking of a stochastically known log with respect to a given process model. The second case extends the first to classify a stochastically known log into one of several process models. The third case extends the two previous ones into settings in which process models are only stochastically known. The suggested problem captures the increasingly growing number of applications in which sensors provide probabilistic process information.