Regression
Ensemble of Learning Project Productivity in Software Effort Based on Use Case Points
Azzeh, Mohammad, Nassif, Ali Bou, Banitaan, Shadi, Lopez-Martin, Cuauhtemoc
Abstract-- It is well recognized that the project productivity is a key driver in estimating software project effort from Use Case Point size metric at early software development stages. Although, there are few proposed models for predicting productivity, there is no consistent conclusion regarding which model is the superior. Therefore, instead of building a new productivity prediction model, this paper presents a new ensemble construction mechanism applied for software project productivity prediction. Ensemble is an effective technique when performance of base models is poor. We proposed a weighted mean method to aggregate predicted productivities based on average of errors produced by training model. The obtained results show that the using ensemble is a good alternative approach when accuracies of base models are not consistently accurate over different datasets, and when models behave diversely.
Evaluating Patient Readmission Risk: A Predictive Analytics Approach
Choudhury, Avishek, Greene, Dr. Christopher M
With the emergence of the Hospital Readmission Reduction Program of the Center for Medicare and Medicaid Services on October 1, 2012, forecasting unplanned patient readmission risk became crucial to the healthcare domain. There are tangible works in the literature emphasizing on developing readmission risk prediction models; However, the models are not accurate enough to be deployed in an actual clinical setting. Our study considers patient readmission risk as the objective for optimization and develops a useful risk prediction model to address unplanned readmissions. Furthermore, Genetic Algorithm and Greedy Ensemble is used to optimize the developed model constraints.
Identification of Cancer - Mesothelioma Disease Using Logistic Regression and Association Rule
Malignant Pleural Mesothelioma (MPM) or malignant mesothelioma (MM) is an atypical, aggressive tumor that matures into cancer in the pleura, a stratum of tissue bordering the lungs. Diagnosis of MPM is difficult and it accounts for about seventy-five percent of all mesothelioma diagnosed yearly in the United States of America. Being a fatal disease, early identification of MPM is crucial for patient survival. Our study implements logistic regression and develops association rules to identify early stage symptoms of MM. We retrieved medical reports generated by Dicle University and implemented logistic regression to measure the model accuracy. We conducted (a) logistic correlation, (b) Omnibus test and (c) Hosmer and Lemeshow test for model evaluation. Moreover, we also developed association rules by confidence, rule support, lift, condition support and deployability. Categorical logistic regression increases the training accuracy from 72.30% to 81.40% with a testing accuracy of 63.46%. The study also shows the top 5 symptoms that is mostly likely indicates the presence in MM. This study concludes that using predictive modeling can enhance primary presentation and diagnosis of MM.
Towards Automatic Personality Prediction Using Facebook Like Categories
Tareaf, Raad Bin, Berger, Philipp, Hennig, Patrick, Meinel, Christoph
We demonstrate that effortlessly accessible digital records of behavior such as Facebook Likes can be obtained and utilized to automatically distinguish a wide range of highly delicate personal traits including: life satisfaction, cultural ethnicity, political views, age, gender and personality traits. The analysis presented based on a dataset of over 738,000 users who conferred their Facebook Likes, social network activities, egocentric network, demographic characteristics, and the results of various psychometric tests for our extended personality analysis. The proposed model uses unique mapping technique between each Facebook Like object to the corresponding Facebook page category/sub-category object, which is then evaluated as features for a set of machine learning algorithms to predict individual psycho-demographic profiles from Likes. The model , distinguishes between a religious and non-religious individual in 83% of circumstances, Asian and European in 87% of situations, and between emotional stable and emotion unstable in 81% of situations. We provide exemplars of correlations between attributes and Likes and present suggestions for future directions.
Learning Sharing Behaviors with Arbitrary Numbers of Agents
Metcalf, Katherine, Theobald, Barry-John, Apostoloff, Nicholas
We propose a method for modeling and learning turn-taking behaviors for accessing a shared resource. We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents. The individual behavior models are weighted finite state transducers (WFSTs) with weights dynamically updated during interactions, and the multi-agent fusion model is a logistic regression classifier. We test our models in a multi-agent tower-building environment, where a Q-learning agent learns to interact with rule-based agents. Our approach accurately models the underlying behavior patterns of the rule-based agents with accuracy ranging between 0.63 and 1.0 depending on the stochasticity of the other agent behaviors. In addition we show using KL-divergence that the model accurately captures the distribution of next actions when interacting with both a single agent (KL-divergence < 0.1) and with multiple agents (KL-divergence < 0.37). Finally, we demonstrate that our behavior model can be used by a Q-learning agent to take turns in an interactive turn-taking environment.
Capturing Between-Tasks Covariance and Similarities Using Multivariate Linear Mixed Models
We consider the problem of predicting several response variables using the same set of explanatory variables. This setting naturally induces a group structure over the coefficient matrix, in which every explanatory variable corresponds to a set of related coefficients. Most of the existing methods that utilize this group formation assume that the similarities between related coefficients arise solely through a joint sparsity structure. In this paper, we propose a procedure for constructing an estimator of a multivariate regression coefficient matrix that directly models and captures the within-group similarities, by employing a multivariate linear mixed model formulation, with joint estimation of covariance matrices for coefficients and errors via penalized likelihood. Our approach, which we term Multivariate random Regression with Covariance Estimation (MrRCE) encourages structured similarity in parameters, in which coefficients for the same variable in related tasks sharing the same sign and similar magnitude. We illustrate the benefits of our approach in synthetic and real examples, and show that the proposed method outperforms natural competitors and alternative estimators under several model settings.
Online Bearing Remaining Useful Life Prediction Based on a Novel Degradation Indicator and Convolutional Neural Networks
Cheng, Cheng, Ma, Guijun, Zhang, Yong, Sun, Mingyang, Teng, Fei, Ding, Han, Yuan, Ye
In industrial applications, nearly half the failures of motors are caused by the degradation of rolling element bearings (REBs). Therefore, accurately estimating the remaining useful life (RUL) for REBs are of crucial importance to ensure the reliability and safety of mechanical systems. To tackle this challenge, model-based approaches are often limited by the complexity of mathematical modeling. Conventional data-driven approaches, on the other hand, require massive efforts to extract the degradation features and construct health index. In this paper, a novel online data-driven framework is proposed to exploit the adoption of deep convolutional neural networks (CNN) in predicting the RUL of bearings. More concretely, the raw vibrations of training bearings are first processed using the Hilbert-Huang transform (HHT) and a novel nonlinear degradation indicator is constructed as the label for learning. The CNN is then employed to identify the hidden pattern between the extracted degradation indicator and the vibration of training bearings, which makes it possible to estimate the degradation of the test bearings automatically. Finally, testing bearings' RULs are predicted by using a $\epsilon$-support vector regression model. The superior performance of the proposed RUL estimation framework, compared with the state-of-the-art approaches, is demonstrated through the experimental results. The generality of the proposed CNN model is also validated by transferring to bearings undergoing different operating conditions.
On effective human robot interaction based on recognition and association
Faces play a magnificent role in human robot interaction, as they do in our daily life. The inherent ability of the human mind facilitates us to recognize a person by exploiting various challenges such as bad illumination, occlusions, pose variation etc. which are involved in face recognition. But it is a very complex task in nature to identify a human face by humanoid robots. The recent literatures on face biometric recognition are extremely rich in its application on structured environment for solving human identification problem. But the application of face biometric on mobile robotics is limited for its inability to produce accurate identification in uneven circumstances. The existing face recognition problem has been tackled with our proposed component based fragmented face recognition framework. The proposed framework uses only a subset of the full face such as eyes, nose and mouth to recognize a person. It's less searching cost, encouraging accuracy and ability to handle various challenges of face recognition offers its applicability on humanoid robots. The second problem in face recognition is the face spoofing, in which a face recognition system is not able to distinguish between a person and an imposter (photo/video of the genuine user). The problem will become more detrimental when robots are used as an authenticator. A depth analysis method has been investigated in our research work to test the liveness of imposters to discriminate them from the legitimate users. The implication of the previous earned techniques has been used with respect to criminal identification with NAO robot. An eyewitness can interact with NAO through a user interface. NAO asks several questions about the suspect, such as age, height, her/his facial shape and size etc., and then making a guess about her/his face.
METCC: METric learning for Confounder Control Making distance matter in high dimensional biological analysis
Manghnani, Kabir, Drake, Adam, Wan, Nathan, Haque, Imran
High-dimensional data acquired from biological experiments such as nextgeneration sequencingare subject to a number of confounding effects. These effects include both technical effects, such as variation across batches from instrument noiseor sample processing ("batch effects"), or institution-specific differences insample acquisition and physical handling ("institutional variability"), as well as biological effects arising from true but irrelevant differences in the biology of each sample, such as age biases in diseases. Prior work has used linear methods toadjust for such batch effects. Here, we apply contrastive metric learning by a nonlinear triplet network to optimize the ability to distinguish biologically distinct sample classes in the presence of irrelevant technical and biological variation. Usingwhole-genome cell-free DNA data from 817 patients, we demonstrate that our approach, METric learning for Confounder Control (METCC), is able to match or exceed the classification performance achieved using a best-in-class linear method(HCP) or no normalization. Critically, results from METCC appear less confounded by irrelevant technical variables like institution and batch than those from other methods even without access to high quality metadata information requiredby many existing techniques; offering hope for improved generalization.
LNEMLC: Label Network Embeddings for Multi-Label Classifiation
Szymański, Piotr, Kajdanowicz, Tomasz, Chawla, Nitesh
Abstract--Multi-label classification aims to classify instances with discrete non-exclusive labels. Most approaches on multilabel classificationfocus on effective adaptation or transformation of existing binary and multi-class learning approaches but fail in modelling the joint probability of labels or do not preserve generalization abilities for unseen label combinations. To address these issues we propose a new multi-label classification scheme, LNEMLC - Label Network Embedding for Multi-Label Classification, thatembeds the label network and uses it to extend input space in learning and inference of any base multi-label classifier. The approach allows capturing of labels' joint probability at low computational complexity providing results comparable to the best methods reported in the literature. We demonstrate how the method reveals statistically significant improvements over the simple kNN baseline classifier. We also provide hints for selecting the robust configuration that works satisfactory across data domains. I. INTRODUCTION In our daily life, we continuously encounter data classified with multiple categories. Be it youtube videos, Instagram photos, articles in newspapers or more recently even our genome on gene analysis websites; we depend heavily on labels to guide us through various types of objects to find that which is to our liking and we rely on labels to organize our information flow. Labels usually denote the simplest understandable terms, while it is from how they occur together that creates sophisticated concepts and contexts.