Diagnosis
Synthetic COVID-19 Chest X-ray Dataset for Computer-Aided Diagnosis
We introduce a new dataset called Synthetic COVID-19 Chest X-ray Dataset for training machine learning models. The dataset consists of 21,295 synthetic COVID-19 chest X-ray images to be used for computer-aided diagnosis. These images, generated via an unsupervised domain adaptation approach, are of high quality. We find that the synthetic images not only improve performance of various deep learning architectures when used as additional training data under heavy imbalance conditions, but also detect the target class with high confidence. We also find that comparable performance can also be achieved when trained only on synthetic images.
Labelling Drifts in a Fault Detection System for Wind Turbine Maintenance
Martinez, Iñigo, Viles, Elisabeth, Cabrejas, Iñaki
A failure detection system is the first step towards predictive maintenance strategies. A popular data-driven method to detect incipient failures and anomalies is the training of normal behaviour models by applying a machine learning technique like feed-forward neural networks (FFNN) or extreme learning machines (ELM). However, the performance of any of these modelling techniques can be deteriorated by the unexpected rise of non-stationarities in the dynamic environment in which industrial assets operate. This unpredictable statistical change in the measured variable is known as concept drift. In this article a wind turbine maintenance case is presented, where non-stationarities of various kinds can happen unexpectedly. Such concept drift events are desired to be detected by means of statistical detectors and window-based approaches. However, in real complex systems, concept drifts are not as clear and evident as in artificially generated datasets. In order to evaluate the effectiveness of current drift detectors and also to design an appropriate novel technique for this specific industrial application, it is essential to dispose beforehand of a characterization of the existent drifts. Under the lack of information in this regard, a methodology for labelling concept drift events in the lifetime of wind turbines is proposed. This methodology will facilitate the creation of a drift database that will serve both as a training ground for concept drift detectors and as a valuable information to enhance the knowledge about maintenance of complex systems.
AMA-GCN: Adaptive Multi-layer Aggregation Graph Convolutional Network for Disease Prediction
Chen, Hao, Zhuang, Fuzhen, Xiao, Li, Ma, Ling, Liu, Haiyan, Zhang, Ruifang, Jiang, Huiqin, He, Qing
Recently, Graph Convolutional Networks (GCNs) have proven to be a powerful mean for Computer Aided Diagnosis (CADx). This approach requires building a population graph to aggregate structural information, where the graph adjacency matrix represents the relationship between nodes. Until now, this adjacency matrix is usually defined manually based on phenotypic information. In this paper, we propose an encoder that automatically selects the appropriate phenotypic measures according to their spatial distribution, and uses the text similarity awareness mechanism to calculate the edge weights between nodes. The encoder can automatically construct the population graph using phenotypic measures which have a positive impact on the final results, and further realizes the fusion of multimodal information. In addition, a novel graph convolution network architecture using multi-layer aggregation mechanism is proposed. The structure can obtain deep structure information while suppressing over-smooth, and increase the similarity between the same type of nodes. Experimental results on two databases show that our method can significantly improve the diagnostic accuracy for Autism spectrum disorder and breast cancer, indicating its universality in leveraging multimodal data for disease prediction.
Next-Gen Machine Learning Supported Diagnostic Systems for Spacecraft
Vlontzos, Athanasios, Sutherland, Gabriel, Ganju, Siddha, Soboczenski, Frank
Future short or long-term space missions require a new generation of monitoring and diagnostic systems due to communication impasses as well as limitations in specialized crew and equipment. Machine learning supported diagnostic systems present a viable solution for medical and technical applications. We discuss challenges and applicability of such systems in light of upcoming missions and outline an example use case for a next-generation medical diagnostic system for future space operations. Additionally, we present approach recommendations and constraints for the successful generation and use of machine learning models aboard a spacecraft.
Handling Continuous Attributes in Decision Trees
It can be seen that the computation of splitting measures assumes finite (read: discrete) attribute values. This begs the question, How are continuous-valued attributes handled in decision trees? The test condition for continuous-valued attributes can either be expressed using a comparison operator (,). Alternatively, the continuous-valued attribute can be split into a finite set of range buckets. It is important to note that a comparison-based test condition gives us a binary split whereas range buckets give us a multiway split.
ModelArts 3.0: a Arue AI Accelerator
HUAWEI CLOUD's Enterprise Intelligence (EI) has achieved strong results in numerous industry competitions and evaluations. HUAWEI CLOUD has invested heavily in basic research AI in three domains: computer vision, speech and semantics, and decision optimization. To help AI empower all industries, the ModelArts enabling platform supports plug-and-play deployment of HUAWEI CLOUD's research results in areas such as automatic machine learning, small sample learning, federated learning, and pre-training models. In the area of perception, HUAWEI CLOUD continues to be an industry-leader in ImageNet large-scale image classification, WebVision large-scale network image classification, MS-COCO two-dimensional object detection, nuScenes three-dimensional object detection, and visual pre-training model verification, including downstream classification, detection, and segmentation. Perception models driven by ModelArts have been widely used in sectors such as medical image analysis, oil and gas exploration, and fault detection in manufacturing. In cognition, HUAWEI CLOUD integrates industry data based on its expertise in semantic analysis and knowledge graphs.
On Efficiently Explaining Graph-Based Classifiers
Huang, Xuanxiang, Izza, Yacine, Ignatiev, Alexey, Marques-Silva, Joao
Recent work has shown that not only decision trees (DTs) may not be interpretable but also proposed a polynomial-time algorithm for computing one PI-explanation of a DT. This paper shows that for a wide range of classifiers, globally referred to as decision graphs, and which include decision trees and binary decision diagrams, but also their multi-valued variants, there exist polynomial-time algorithms for computing one PI-explanation. In addition, the paper also proposes a polynomial-time algorithm for computing one contrastive explanation. These novel algorithms build on explanation graphs (XpG's). XpG's denote a graph representation that enables both theoretical and practically efficient computation of explanations for decision graphs. Furthermore, the paper pro- poses a practically efficient solution for the enumeration of explanations, and studies the complexity of deciding whether a given feature is included in some explanation. For the concrete case of decision trees, the paper shows that the set of all contrastive explanations can be enumerated in polynomial time. Finally, the experimental results validate the practical applicability of the algorithms proposed in the paper on a wide range of publicly available benchmarks.
Sample Selection Bias in Evaluation of Prediction Performance of Causal Models
Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However prediction performance does depend on the selection of training and test sets. In particular biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren [Kemmeren et al., 2014]. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance on Kemmeren and compare models on this new set. In this setting, the causal model tested have similar performance to standard association based estimators such as Lasso. Finally we compare the performance of causal estimators in simulation studies which reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.
Machine Learning for Accurate Intraoperative Pediatric Middle Ear Effusion Diagnosis
OBJECTIVES: Misdiagnosis of acute and chronic otitis media in children can result in significant consequences from either undertreatment or overtreatment. Our objective was to develop and train an artificial intelligence algorithm to accurately predict the presence of middle ear effusion in pediatric patients presenting to the operating room for myringotomy and tube placement. METHODS: We trained a neural network to classify images as “ normal” (no effusion) or “abnormal” (effusion present) using tympanic membrane images from children taken to the operating room with the intent of performing myringotomy and possible tube placement for recurrent acute otitis media or otitis media with effusion. Model performance was tested on held-out cases and fivefold cross-validation. RESULTS: The mean training time for the neural network model was 76.0 (SD ± 0.01) seconds. Our model approach achieved a mean image classification accuracy of 83.8% (95% confidence interval [CI]: 82.7–84.8). In support of this classification accuracy, the model produced an area under the receiver operating characteristic curve performance of 0.93 (95% CI: 0.91–0.94) and F1-score of 0.80 (95% CI: 0.77–0.82). CONCLUSIONS: Artificial intelligence–assisted diagnosis of acute or chronic otitis media in children may generate value for patients, families, and the health care system by improving point-of-care diagnostic accuracy. With a small training data set composed of intraoperative images obtained at time of tympanostomy tube insertion, our neural network was accurate in predicting the presence of a middle ear effusion in pediatric ear cases. This diagnostic accuracy performance is considerably higher than human-expert otoscopy-based diagnostic performance reported in previous studies.
Ensemble machine learning approach for screening of coronary heart disease based on echocardiography and risk factors
Zhang, Jingyi, Zhu, Huolan, Chen, Yongkai, Yang, Chenguang, Cheng, Huimin, Li, Yi, Zhong, Wenxuan, Wang, Fang
Background: Extensive clinical evidence suggests that a preventive screening of coronary heart disease (CHD) at an earlier stage can greatly reduce the mortality rate. We use 64 two-dimensional speckle tracking echocardiography (2D-STE) features and seven clinical features to predict whether one has CHD. Methods: We develop a machine learning approach that integrates a number of popular classification methods together by model stacking, and generalize the traditional stacking method to a two-step stacking method to improve the diagnostic performance. Results: By borrowing strengths from multiple classification models through the proposed method, we improve the CHD classification accuracy from around 70% to 87.7% on the testing set. The sensitivity of the proposed method is 0.903 and the specificity is 0.843, with an AUC of 0.904, which is significantly higher than those of the individual classification models. Conclusions: Our work lays a foundation for the deployment of speckle tracking echocardiography-based screening tools for coronary heart disease.