Performance Analysis
Prediction of Success or Failure for Final Examination using Nearest Neighbor Method to the Trend of Weekly Online Testing
Using the outputs obtained from the online testing, it is not so difficult to collect a large-scale of learning data. We may be able to actively tackle the collected data to find the optimal strategies for better learning methods. It is also important to analyze the data theoretically (see [23]). This paper is aimed at obtaining effective learning strategies for students at risk for failing courses and/or dropping out, using a large-scale of learning data collected from the online testings. In this paper, unlike the conventional methods using the correct answer rate (CAR) to identify the ability of a student (e.g., see [13]), we use the ability obtained from the item response theory (IRT, e.g., see [1], [4], [17]), and we show a new method to identify students at risk as early as possible using the IRT results.
Evaluating Patient Readmission Risk: A Predictive Analytics Approach
Choudhury, Avishek, Greene, Dr. Christopher M
With the emergence of the Hospital Readmission Reduction Program of the Center for Medicare and Medicaid Services on October 1, 2012, forecasting unplanned patient readmission risk became crucial to the healthcare domain. There are tangible works in the literature emphasizing on developing readmission risk prediction models; However, the models are not accurate enough to be deployed in an actual clinical setting. Our study considers patient readmission risk as the objective for optimization and develops a useful risk prediction model to address unplanned readmissions. Furthermore, Genetic Algorithm and Greedy Ensemble is used to optimize the developed model constraints.
Classification of Cervical Cancer Dataset
Choudhury, Avishek, Wesabi, Y. M. S Al, Won, Daehan
Cervical cancer is the leading gynecological malignancy worldwide. This paper presents diverse classification techniques and shows the advantage of feature selection approaches to the best predicting of cervical cancer disease. There are thirty-two attributes with eight hundred and fifty-eight samples. Besides, this data suffers from missing values and imbalance data. Therefore, over-sampling, under-sampling and embedded over and under sampling have been used. Furthermore, dimensionality reduction techniques are required for improving the accuracy of the classifier. Therefore, feature selection methods have been studied as they divided into two distinct categories, filters and wrappers. The results show that age, first sexual intercourse, number of pregnancies, smokes, hormonal contraceptives, and STDs: genital herpes are the main predictive features with high accuracy with 97.5%. Decision Tree classifier is shown to be advantageous in handling classification assignment with excellent performance.
Decision Support System for Renal Transplantation
Khan, Ehsan, Choudhury, Avishek, Friedman, Amy L, Won, Daehan
The burgeoning need for kidney transplantation mandates immediate attention. Mismatch of deceased donor-recipient kidney leads to post-transplant death. To ensure ideal kidney donor-recipient match and minimize post-transplant deaths, the paper develops a prediction model that identifies factors that determine the probability of success of renal transplantation, that is, if the kidney procured from the deceased donor can be transplanted or discarded. The paper conducts a study enveloping data for 584 imported kidneys collected from 12 transplant centers associated with an organ procurement organization located in New York City, NY. The predicting model yielding best performance measures can be beneficial to the healthcare industry. Transplant centers and organ procurement organizations can take advantage of the prediction model to efficiently predict the outcome of kidney transplantation. Consequently, it will reduce the mortality rate caused by mismatching of donor-recipient kidney transplantation during the surgery.
Kernel Treelets
Xia, Hedi, Ceniceros, Hector D.
Treelets, introduced by Lee, Nadler, and Wasserman [1, 2], is a method to produce a multiscale, hierarchicaldecomposition of unordered data. The central premise of Treelets is to exploit sparsity and capture intrinsic localized structures with only a few features, represented interms of an orthonormal basis. The hierarchical tree constructed by the treelet algorithm provides a scale-based partition of the data that can be used for classification, specially for cluster analysis [3]. Cluster analysis, also called clustering, is concerned with finding a partition of a set such that its corresponding equivalence class captures similarity of its elements. The Treelet approach is an example of hierarchical clustering (HC) [4], which is a type of methods that provides a nested and multiscale clustering.
Variational Bayesian Complex Network Reconstruction
Xu, Shuang, Zhang, Chun-Xia, Wang, Pei, Zhang, Jiangshe
The networked systems are ubiquitous in many fields, including social-tech science [1, 2], bioinformatics [3-6], epidemic dynamics [7-9] and power grid [10, 11]. However, as is often the case, it is not able to observe the topology of a network, while data generated by this network are available. Therefore, in interdisciplinary science, one of the most important but challenging problems is to reconstruct the complex network from the observed data or time series [12]. This problem has been widely investigated in the past three decades, where the classical method is the delay-coordinate embedding method proposed by Takens [13], which, nevertheless, is only suitable for small-scale networks. Nowadays, with the advent of big data era [14], it is of great urgency solve this issue for large-scale complex networks. Suppose that a complex network consists of N nodes, in practice we are often given the time series of the states for the N nodes. Generally speaking, the core idea of many data-driven network reconstruction investigations is to first calculate the correlation between two nodes. Then, a threshold can be set mutually or automatically to make the network binary.
Deep Anomaly Detection with Outlier Exposure
Hendrycks, Dan, Mazeika, Mantas, Dietterich, Thomas G.
It is important to detect and handle anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data commonly used by deep learning systems are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This approach enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments in vision and natural language processing settings, we find that Outlier Exposure significantly improves the detection performance. Our approach is even applicable to density estimation models and anomaly detectors for large-scale images. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
Data Strategies for Fleetwide Predictive Maintenance
Senior Technical Fellow PeopleTec, Inc. Huntsville, AL, USA ABSTRACT For predictive maintenance, we examine one of the largest public datasets for machine failures derived along with their corresponding precursors as error rates, historical part replacements and sensor inputs. To simplify the timeaccuracy comparisonbetween 27 different algorithms, we treat the imbalance between normal and failing states with nominal under-sampling. We identify 3 promising regression and discriminant algorithms with both higher accuracy (96%) and twenty-fold faster execution times than previous work. Because predictive maintenance success hinges on input features prior to prediction, we provide a methodology to rank-order feature importance and show that for this dataset, error counts prove more predictive than scheduled maintenance might imply solely based on more traditional factors such as machine age or last replacement times. INTRODUCTION Successful predictive maintenance is challenging not only because failures can prove multifactorial but also because maintenance forecasters often lack good training data.
Deep Program Reidentification: A Graph Neural Network Solution
Wang, Shen, Chen, Zhengzhang, Li, Ding, Tang, Lu-An, Ni, Jingchao, Li, Zhichun, Rhee, Junghwan, Chen, Haifeng, Yu, Philip S.
Program or process is an integral part of almost every IT/OT system. Can we trust the identity/ID (e.g., executable name) of the program? To avoid detection, malware may disguise itself using the ID of a legitimate program, and a system tool (e.g., PowerShell) used by the attackers may have the fake ID of another common software, which is less sensitive. However, existing intrusion detection techniques often overlook this critical program reidentification problem (i.e., checking the program's identity). In this paper, we propose an attentional multi-channel graph neural network model (DeepRe-ID) to verify the program's identity based on its system behaviors. The key idea is to leverage the representation learning of the program behavior graph to guide the reidentification process. We formulate the program reidentification as a graph classification problem and develop an effective multi-channel attentional graph embedding algorithm to solve it. Extensive experiments --- using real-world enterprise monitoring data and real attacks --- demonstrate the effectiveness of DeepRe-ID across multiple popular metrics and the robustness to the normal dynamic changes like program version upgrades.
Bootstrapping a Structured Self-improving & Safe Autopoietic Self
After nearly sixty years of failing to program artificial intelligence (AI), it is now time to grow it using an enactive approach instead. Critically, however, we need to ensure that it matures with a “moral sense” that will ensure the safety and well-being of the human race. Consciousness and conscience can lead the way towards creating safe and cooperative machine entities.