Consensus Sequences of event logs are often used in process mining to quickly grasp the core sequence of events to be performed in a process, or to represent the backbone of the process for doing other analyses. However, it is still not clear how many traces are enough to properly represent the underlying process. In this paper, we propose a novel sampling strategy to determine the number of traces necessary to produce a representative consensus sequence. We show how to estimate the difference between the predefined Expert Model and the real processes carried out. This difference level can be used as reference for domain experts to adjust the Expert Model. In addition, we apply this strategy to several real-world workflow activity datasets as a case study. We show a sample curve fitting task to help readers better understand our proposed methodology.
Tell us more about your favorate machine learning framework in comments. Bio: Devendra Desale(@DevendraDesale) is a data science graduate student currently working on text mining and big data technologies. He is also interested in enterprise architectures and data-driven business. When away from the computer, he also enjoys attending meetups and venturing into the unknown.
Mixtures-of-Experts models and their maximum likelihood estimation (MLE) via the EM algorithm have been thoroughly studied in the statistics and machine learning literature. They are subject of a growing investigation in the context of modeling with high-dimensional predictors with regularized MLE. We examine MoE with Gaussian gating network, for clustering and regression, and propose an $\ell_1$-regularized MLE to encourage sparse models and deal with the high-dimensional setting. We develop an EM-Lasso algorithm to perform parameter estimation and utilize a BIC-like criterion to select the model parameters, including the sparsity tuning hyperparameters. Experiments conducted on simulated data show the good performance of the proposed regularized MLE compared to the standard MLE with the EM algorithm.
I am building upon an existing product with a large user base. That base is divided into groups, which have a common goal. To achieve their goal each group must process a large amount of information on a regular basis. I have found it is possible to automate a large chunk of each group's work by using ML. The nature of each group's activities is such that there will always be special edge cases and exceptions that can not be learned.
Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborious to the expert if the number of candidate features is very large. We introduce a probabilistic model that can incorporate expert feedback about the impact of genomic measurements on the sensitivity of a cancer cell for a given drug. We also present two methods to intelligently collect this feedback from the expert, using experimental design and multi-armed bandit models. In a multiple myeloma blood cancer data set (n=51), expert knowledge decreased the prediction error by 8%. Furthermore, the intelligent approaches can be used to reduce the workload of feedback collection to less than 30% on average compared to a naive approach.