Goto

Collaborating Authors

 Diagnosis


Predictive Situation Awareness for Ebola Virus Disease using a Collective Intelligence Multi-Model Integration Platform: Bayes Cloud

arXiv.org Artificial Intelligence

The humanity has been facing a plethora of challenges associated with infectious diseases, which kill more than 6 million people a year. Although continuous efforts have been applied to relieve the potential damages from such misfortunate events, it is unquestionable that there are many persisting challenges yet to overcome. One related issue we particularly address here is the assessment and prediction of such epidemics. In this field of study, traditional and ad-hoc models frequently fail to provide proper predictive situation awareness (PSAW), characterized by understanding the current situations and predicting the future situations. Comprehensive PSAW for infectious disease can support decision making and help to hinder disease spread. In this paper, we develop a computing system platform focusing on collective intelligence causal modeling, in order to support PSAW in the domain of infectious disease. Analyses of global epidemics require integration of multiple different data and models, which can be originated from multiple independent researchers. These models should be integrated to accurately assess and predict the infectious disease in terms of holistic view. The system shall provide three main functions: (1) collaborative causal modeling, (2) causal model integration, and (3) causal model reasoning. These functions are supported by subject-matter expert and artificial intelligence (AI), with uncertainty treatment. Subject-matter experts, as collective intelligence, develop causal models and integrate them as one joint causal model. The integrated causal model shall be used to reason about: (1) the past, regarding how the causal factors have occurred; (2) the present, regarding how the spread is going now; and (3) the future, regarding how it will proceed. Finally, we introduce one use case of predictive situation awareness for the Ebola virus disease.


Optimal Sparse Decision Trees

arXiv.org Machine Learning

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. We highlight possible steps to improving the scalability and speed of future generations of this algorithm based on insights from our theory and experiments.


Formal Verification of Decision-Tree Ensemble Model and Detection of its Violating-input-value Ranges

arXiv.org Artificial Intelligence

As one type of machine-learning model, a "decision-tree ensemble model" (DTEM) is represented by a set of decision trees. A DTEM is mainly known to be valid for structured data; however, like other machine-learning models, it is difficult to train so that it returns the correct output value for any input value. Accordingly, when a DTEM is used in regard to a system that requires reliability, it is important to comprehensively detect input values that lead to malfunctions of a system (failures) during development and take appropriate measures. One conceivable solution is to install an input filter that controls the input to the DTEM, and to use separate software to process input values that may lead to failures. To develop the input filter, it is necessary to specify the filtering condition of the input value that leads to the malfunction of the system. Given that necessity, in this paper, we propose a method for formally verifying a DTEM and, according to the result of the verification, if an input value leading to a failure is found, extracting the range in which such an input value exists. The proposed method can comprehensively extract the range in which the input value leading to the failure exists; therefore, by creating an input filter based on that range, it is possible to prevent the failure occurring in the system. In this paper, the algorithm of the proposed method is described, and the results of a case study using a dataset of house prices are presented. On the basis of those results, the feasibility of the proposed method is demonstrated, and its scalability is evaluated.


Learning Optimal Decision Trees from Large Datasets

arXiv.org Machine Learning

Inferring a decision tree from a given dataset is one of the classic problems in machine learning. This problem consists of buildings, from a labelled dataset, a tree such that each node corresponds to a class and a path between the tree root and a leaf corresponds to a conjunction of features to be satisfied in this class. Following the principle of parsimony, we want to infer a minimal tree consistent with the dataset. Unfortunately, inferring an optimal decision tree is known to be NP-complete for several definitions of optimality. Hence, the majority of existing approaches relies on heuristics, and as for the few exact inference approaches, they do not work on large data sets. In this paper, we propose a novel approach for inferring a decision tree of a minimum depth based on the incremental generation of Boolean formula. The experimental results indicate that it scales sufficiently well and the time it takes to run grows slowly with the size of dataset.


Plant-wide fault and disturbance screening using combined transfer entropy and eigenvector centrality analysis

arXiv.org Artificial Intelligence

Finding the source of a disturbance or fault in complex systems such as industrial chemical processing plants can be a difficult task and consume a significant number of engineering hours. In many cases, a systematic elimination procedure is considered to be the only feasible approach but can cause undesired process upsets. Practitioners desire robust alternative approaches. This paper presents an unsupervised, data-driven method for ranking process elements according to the magnitude and novelty of their influence. Partial bivariate transfer entropy estimation is used to infer a weighted directed graph of process elements. Eigenvector centrality is applied to rank network nodes according to their overall effect. As the ranking of process elements rely on emerging properties that depend on the aggregate of many connections, the results are robust to errors in the estimation of individual edge properties and the inclusion of indirect connections that do not represent the true causal structure of the process. A monitoring chart of continuously calculated process element importance scores over multiple overlapping time regions can assist with incipient fault detection. Ranking results combined with visual inspection of information transfer networks is also useful for root cause analysis of known faults and disturbances. A software implementation of the proposed method is available.


Facebook announces AI system to detect revenge porn, says accounts posting it will be deleted

#artificialintelligence

Facebook first started trialling a system to combat revenge porn late last year, but it had one rather scary aspect: you had to upload your own nudes so the platform knew which images it should block. The earlier system, which is still in use and set for expanded rollout, relied on users uploading photos they were afraid might be shared, allowing Facebook to create a digital fingerprint to block uploads of matching images. You send the nude to yourself in Messenger, and Facebook creates a hashed digital fingerprint of the photo โ€“ an encrypted version of the raw data in the image file. Anytime someone tries to upload a photo, it is checked against that fingerprint and rejected if it matches. Facebook says its new AI-based system is designed to automatically detect nude or near-nude images, before passing them for a human moderator to decide whether the photo or video should be blocked.


Rectified Decision Trees: Towards Interpretability, Compression and Empirical Soundness

arXiv.org Machine Learning

How to obtain a model with good interpretability and performance has always been an important research topic. In this paper, we propose rectified decision trees (ReDT), a knowledge distillation based decision trees rectification with high interpretability, small model size, and empirical soundness. Specifically, we extend the impurity calculation and the pure ending condition of the classical decision tree to propose a decision tree extension that allows the use of soft labels generated by a well-trained teacher model in training and prediction process. It is worth noting that for the acquisition of soft labels, we propose a new multiple cross-validation based method to reduce the effects of randomness and overfitting. These approaches ensure that ReDT retains excellent interpretability and even achieves fewer nodes than the decision tree in the aspect of compression while having relatively good performance. Besides, in contrast to traditional knowledge distillation, back propagation of the student model is not necessarily required in ReDT, which is an attempt of a new knowledge distillation approach. Extensive experiments are conducted, which demonstrates the superiority of ReDT in interpretability, compression, and empirical soundness.


Applying Active Diagnosis to Space Systems by On-Board Control Procedures

arXiv.org Artificial Intelligence

The instrumentation of real systems is often designed for control purposes and control inputs are designed to achieve nominal control objectives. Hence, the available measurements may not be sufficient to isolate faults with certainty and diagnoses are ambiguous. Active diagnosis formulates a planning problem to generate a sequence of actions that, applied to the system, enforce diagnosability and allow to iteratively refine ambiguous diagnoses. This paper analyses the requirements for applying active diagnosis to space systems and proposes ActHyDiag as an effective framework to solve this problem. It presents the results of applying ActHyDiag to a real space case study and of implementing the generated plans in the form of On-Board Control Procedures. The case study is a redundant Spacewire Network where up to 6 instruments, monitored and controlled by the on-board software hosted in the Satellite Management Unit, are transferring science data to a mass memory unit through Spacewire routers. Experiments have been conducted on a real physical benchmark developed by Thales Alenia Space and demonstrate the effectiveness of the plans proposed by ActHyDiag.


Wasserstein Distance based Deep Adversarial Transfer Learning for Intelligent Fault Diagnosis

arXiv.org Machine Learning

The demand of artificial intelligent adoption for condition-based maintenance strategy is astonishingly increased over the past few years. Intelligent fault diagnosis is one critical topic of maintenance solution for mechanical systems. Deep learning models, such as convolutional neural networks (CNNs), have been successfully applied to fault diagnosis tasks for mechanical systems and achieved promising results. However, for diverse working conditions in the industry, deep learning suffers two difficulties: one is that the well-defined (source domain) and new (target domain) datasets are with different feature distributions; another one is the fact that insufficient or no labelled data in target domain significantly reduce the accuracy of fault diagnosis. As a novel idea, deep transfer learning (DTL) is created to perform learning in the target domain by leveraging information from the relevant source domain. Inspired by Wasserstein distance of optimal transport, in this paper, we propose a novel DTL approach to intelligent fault diagnosis, namely Wasserstein Distance based Deep Transfer Learning (WD-DTL), to learn domain feature representations (generated by a CNN based feature extractor) and to minimize the distributions between the source and target domains through adversarial training. The effectiveness of the proposed WD-DTL is verified through 3 transfer scenarios and 16 transfer fault diagnosis experiments of both unsupervised and supervised (with insufficient labelled data) learning. We also provide a comprehensive analysis of the network visualization of those transfer tasks.


On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

arXiv.org Machine Learning

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.