Diagnosis
Fairness-Aware Process Mining
Qafari, Mahnaz Sadat, van der Aalst, Wil
Process mining is a multi-purpose tool enabling organizations to improve their processes. One of the primary purposes of process mining is finding the root causes of performance or compliance problems in processes. The usual way of doing so is by gathering data from the process event log and other sources and then applying some data mining and machine learning techniques. However, the results of applying such techniques are not always acceptable. In many situations, this approach is prone to making obvious or unfair diagnoses and applying them may result in conclusions that are unsurprising or even discriminating (e.g., blaming overloaded employees for delays). In this paper, we present a solution to this problem by creating a fair classifier for such situations. The undesired effects are removed at the expense of reduction on the accuracy of the resulting classifier. We have implemented this method as a plug-in in ProM. Using the implemented plug-in on two real event logs, we decreased the discrimination caused by the classifier, while losing a small fraction of its accuracy.
Towards automated symptoms assessment in mental health
Activity and motion analysis has the potential to be used as a diagnostic tool for mental disorders. However, to-date, little work has been performed in turning stratification measures of activity into useful symptom markers. The research presented in this thesis has focused on the identification of objective activity and behaviour metrics that could be useful for the analysis of mental health symptoms in the above mentioned dimensions. Particular attention is given to the analysis of objective differences between disorders, as well as identification of clinical episodes of mania and depression in bipolar patients, and deterioration in borderline personality disorder patients. A principled framework is proposed for mHealth monitoring of psychiatric patients, based on measurable changes in behaviour, represented in physical activity time series, collected via mobile and wearable devices. The framework defines methods for direct computational analysis of symptoms in disorganisation and psychomotor dimensions, as well as measures for indirect assessment of mood, using patterns of physical activity, sleep and circadian rhythms. The approach of computational behaviour analysis, proposed in this thesis, has the potential for early identification of clinical deterioration in ambulatory patients, and allows for the specification of distinct and measurable behavioural phenotypes, thus enabling better understanding and treatment of mental disorders.
Similarity-based Android Malware Detection Using Hamming Distance of Static Binary Features
Taheri, Rahim, Ghahramani, Meysam, Javidan, Reza, Shojafar, Mohammad, Pooranian, Zahra, Conti, Mauro
In this paper, we develop four malware detection methods using Hamming distance to find similarity between samples which are first nearest neighbors (FNN), all nearest neighbors (ANN), weighted all nearest neighbors (WANN), and k-medoid based nearest neighbors (KMNN). In our proposed methods, we can trigger the alarm if we detect an Android app is malicious. Hence, our solutions help us to avoid the spread of detected malware on a broader scale. We provide a detailed description of the proposed detection methods and related algorithms. We include an extensive analysis to asses the suitability of our proposed similarity-based detection methods. In this way, we perform our experiments on three datasets, including benign and malware Android apps like Drebin, Contagio, and Genome. Thus, to corroborate the actual effectiveness of our classifier, we carry out performance comparisons with some state-of-the-art classification and malware detection algorithms, namely Mixed and Separated solutions, the program dissimilarity measure based on entropy (PDME) and the FalDroid algorithms. We test our experiments in a different type of features: API, intent, and permission features on these three datasets. The results confirm that accuracy rates of proposed algorithms are more than 90% and in some cases (i.e., considering API features) are more than 99%, and are comparable with existing state-of-the-art solutions.
Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables
Salehkaleybar, Saber, Ghassami, AmirEmad, Kiyavash, Negar, Zhang, Kun
We consider the problem of learning causal models from observational data generated by linear non-Gaussian acyclic causal models with latent variables. Without considering the effect of latent variables, one usually infers wrong causal relationships among the observed variables. Under faithfulness assumption, we propose a method to check whether there exists a causal path between any two observed variables. From this information, we can obtain the causal order among them. The next question is then whether or not the causal effects can be uniquely identified as well. It can be shown that causal effects among observed variables cannot be identified uniquely even under the assumptions of faithfulness and non-Gaussianity of exogenous noises. However, we will propose an efficient method to identify the set of all possible causal effects that are compatible with the observational data. Furthermore, we present some structural conditions on the causal graph under which we can learn causal effects among observed variables uniquely. We also provide necessary and sufficient graphical conditions for unique identification of the number of variables in the system. Experiments on synthetic data and real-world data show the effectiveness of our proposed algorithm on learning causal models.
Towards Optimizing Reiter's HS-Tree for Sequential Diagnosis
Reiter's HS-Tree is one of the most popular diagnostic search algorithms due to its desirable properties and general applicability. In sequential diagnosis, where the addressed diagnosis problem is subject to successive change through the acquisition of additional knowledge about the diagnosed system, HS-Tree is used in a stateless fashion. That is, the existing search tree is discarded when new knowledge is obtained, albeit often large parts of the tree are still relevant and have to be rebuilt in the next iteration, involving redundant operations and costly reasoner calls. As a remedy to this, we propose DynamicHS, a variant of HS-Tree that avoids these redundancy issues by maintaining state throughout sequential diagnosis while preserving all desirable properties of HS-Tree. Preliminary results of ongoing evaluations in a problem domain where HS-Tree is the state-of-the-art diagnostic method suggest significant time savings achieved by DynamicHS by reducing expensive reasoner calls.
Fully Unsupervised Feature Alignment for Critical System Health Monitoring with Varied Operating Conditions
The failure of a complex and safety critical industrial asset can have extremely high consequences. Close monitoring for early detection of abnormal system conditions is therefore required. Data-driven solutions to this problem have been limited for two reasons: First, safety critical assets are designed and maintained to be highly reliable and faults are rare. Fault detection can thus not be supervised. Second, complex industrial systems usually have long lifetime and face very different operating conditions. Collecting a representative training dataset would require long observation periods, and delay the monitoring of the system. In this paper, we propose a methodology to monitor the systems in their early life. To do so, we enhance the training dataset with other units from a fleet, for which longer observations are available. Since each unit has its own specificity, we propose to extract features made independent of their origin by three unsupervised feature alignment techniques. First, using a variational encoder, we impose a shared latent space for both units. Second, we introduce a new loss designed to conserve inter-point spacial relationships between the input and the latent spaces. Last, we propose to train in an adversarial manner a discriminator on the origin of the features. Once aligned, the features are fed to a one-class classifier to monitor the health of the system. By exploring the different combinations of the proposed alignment strategies, and by testing them on a real case study, a fleet composed of 112 power plants operated in different geographical locations and under very different operating regimes, we demonstrate that this alignment is necessary and beneficial.
Online Local Boosting: improving performance in online decision trees
da Costa, Victor G. Turrisi, Mastelini, Saulo Martiello, de Carvalho, André C. Ponce de Leon Ferreira, Barbon, Sylvio Jr
As more data are produced each day, and faster, data stream mining is growing in importance, making clear the need for algorithms able to fast process these data. Data stream mining algorithms are meant to be solutions to extract knowledge online, specially tailored from continuous data problem. Many of the current algorithms for data stream mining have high processing and memory costs. Often, the higher the predictive performance, the higher these costs. To increase predictive performance without largely increasing memory and time costs, this paper introduces a novel algorithm, named Online Local Boosting (OLBoost), which can be combined into online decision tree algorithms to improve their predictive performance without modifying the structure of the induced decision trees. For such, OLBoost applies a boosting to small separate regions of the instances space. Experimental results presented in this paper show that by using OLBoost the online learning decision tree algorithms can significantly improve their predictive performance. Additionally, it can make smaller trees perform as good or better than larger trees.
Unsupervised Fault Detection in Varying Operating Conditions
Training data-driven approaches for complex industrial system health monitoring is challenging. When data on faulty conditions are rare or not available, the training has to be performed in a unsupervised manner. In addition, when the observation period, used for training, is kept short, to be able to monitor the system in its early life, the training data might not be representative of all the system normal operating conditions. In this paper, we propose five approaches to perform fault detection in such context. Two approaches rely on the data from the unit to be monitored only: the baseline is trained on the early life of the unit. An incremental learning procedure tries to learn new operating conditions as they arise. Three other approaches take advantage of data from other similar units within a fleet. In two cases, units are directly compared to each other with similarity measures, and the data from similar units are combined in the training set. We propose, in the third case, a new deep-learning methodology to perform, first, a feature alignment of different units with an Unsupervised Feature Alignment Network (UFAN). Then, features of both units are combined in the training set of the fault detection neural network. The approaches are tested on a fleet comprising 112 units, observed over one year of data. All approaches proposed here are an improvement to the baseline, trained with two months of data only. As units in the fleet are found to be very dissimilar, the new architecture UFAN, that aligns units in the feature space, is outperforming others.
A prospective multicentre study testing the diagnostic accuracy of an automated cough sound centred analytic system for the identification of common respiratory disorders in children
In paediatrics, respiratory disorders represent the second most common reason for attendance at Emergency Departments (ED) [1, 2] and are a significant global disease burden [3]. Common conditions in childhood include croup, upper respiratory tract infections (URTI), and lower respiratory tract diseases (LRTDs) such as asthma/reactive airway disease (RAD), bronchiolitis, pneumonitis and pneumonia [2, 4]. Lower respiratory tract infections are a significant cause of mortality in children aged under 5 years and a leading cause of disability-adjusted life years lost worldwide [5–7]. Asthma represents the leading cause of non-fatal disease burden in Australian children under age 14 years [8, 9]. The differential diagnosis of respiratory disorders can be challenging even for experienced clinicians with access to diagnostic support services.
The Impact of Feature Causality on Normal Behaviour Models for SCADA-based Wind Turbine Fault Detection
Felgueira, Telmo, Rodrigues, Silvio, Perone, Christian S., Castro, Rui
The cost of wind energy can be reduced by using SCADA data to detect faults in wind turbine components. Normal behavior models are one of the main fault detection approaches, but there is a lack of consensus in how different input features affect the results. In this work, a new taxonomy based on the causal relations between the input features and the target is presented. Based on this taxonomy, the impact of different input feature configurations on the modelling and fault detection performance is evaluated. To this end, a framework that formulates the detection of faults as a classification problem is also presented.