Goto

Collaborating Authors

 Accuracy


Towards Reliable, Automated General Movement Assessment for Perinatal Stroke Screening in Infants Using Wearable Accelerometers

arXiv.org Machine Learning

Perinatal stroke (PS) is a serious condition that, if undetected and thus untreated, often leads to life-long disability, in particular Cerebral Palsy (CP). In clinical settings, Prechtl's General Movement Assessment (GMA) can be used to classify infant movements using a Gestalt approach, identifying infants at high risk of developing PS. Training and maintenance of assessment skills are essential and expensive for the correct use of GMA, yet many practitioners lack these skills, preventing larger-scale screening and leading to significant risks of missing opportunities for early detection and intervention for affected infants. We present an automated approach to GMA, based on body-worn accelerometers and a novel sensor data analysis method-Discriminative Pattern Discovery (DPD)-that is designed to cope with scenarios where only coarse annotations of data are available for model training. We demonstrate the effectiveness of our approach in a study with 34 newborns (21 typically developing infants and 13 PS infants with abnormal movements). Our method is able to correctly recognise the trials with abnormal movements with at least the accuracy that is required by newly trained human annotators (75%), which is encouraging towards our ultimate goal of an automated PS screening system that can be used population-wide.


Prediction of Malignant & Benign Breast Cancer: A Data Mining Approach in Healthcare Applications

arXiv.org Machine Learning

As much as data science is playing a pivotal role everywhere, healthcare also finds it prominent application. Breast Cancer is the top rated type of cancer amongst women; which took away 627,000 lives alone. This high mortality rate due to breast cancer does need attention, for early detection so that prevention can be done in time. As a potential contributor to state-of-art technology development, data mining finds a multi-fold application in predicting Brest cancer. This work focuses on different classification techniques implementation for data mining in predicting malignant and benign breast cancer. Breast Cancer Wisconsin data set from the UCI repository has been used as experimental dataset while attribute clump thickness being used as an evaluation class. The performances of these twelve algorithms: Ada Boost M 1, Decision Table, J Rip, Lazy IBK, Logistics Regression, Multiclass Classifier, Multilayer Perceptron, Naive Bayes, Random forest and Random Tree are analyzed on this data set. Keywords- Data Mining, Classification Techniques, UCI repository, Breast Cancer, Classification Algorithms


Predictive Inequity in Object Detection

arXiv.org Machine Learning

In this work, we investigate whether state-of-the-art object detection systems have equitable predictive performance on pedestrians with different skin tones. This work is motivated by many recent examples of ML and vision systems displaying higher error rates for certain demographic groups than others. We annotate an existing large scale dataset which contains pedestrians, BDD100K, with Fitzpatrick skin tones in ranges [1-3] or [4-6]. We then provide an in-depth comparative analysis of performance between these two skin tone groupings, finding that neither time of day nor occlusion explain this behavior, suggesting this disparity is not merely the result of pedestrians in the 4-6 range appearing in more difficult scenes for detection. We investigate to what extent time of day, occlusion, and reweighting the supervised loss during training affect this predictive bias.


Web Links Prediction And Category-Wise Recommendation Based On Browser History

arXiv.org Machine Learning

A web browser should not be only for browsing web pages but also help users to find out their target websites and recommend similar type websites based on their behavior. Throughout this paper, we propose two methods to make a web browser more intelligent about link prediction which works during typing on address-bar and recommendation of websites according to several categories. Our proposed link prediction system is actually frecency prediction which is predicted based on the first visit, last visit and URL counts. But recommend system is the most challenging as it is needed to classify web URLs according to names without visiting web pages. So we use existing model for URL classification. The only existing approach gives unsatisfactory results and low accuracy. So we add hyperparameter optimization with an existing approach that finds the best parameters for existing URL classification model and gives better accuracy. In this paper, we propose a category wise recommendation system using frecency value and the total visit of individual URL category.


4 Takeaways from 'How Google Does Machine Learning' course

#artificialintelligence

Today, Machine Learning (ML) technology is simplified and abstracted to an API call so you can solve a data-intensive pattern matching problem easily. Google's Move Mirror is a great example. While creating a standalone ML-based consumer app is reasonably straightforward, it can be quite challenging to infuse ML at scale into a mission-critical enterprise-class cloud platform. Enterprise apps have to consider various steps in the machine learning life cycle including data cleansing, integration, and production deployment. Operationalizing ML is a topic by itself and I'll share more on that in a future post.


Predicting customer's gender and age depending on mobile phone data

arXiv.org Machine Learning

In the age of data driven solution, the customer demographic attributes, such as gender and age, play a core role that may enable companies to enhance the offers of their services and target the right customer in the right time and place. In the marketing campaign, the companies want to target the real user of the GSM (global system for mobile communications), not the line owner. Where sometimes they may not be the same. This work proposes a method that predicts users' gender and age based on their behavior, services and contract information. We used call detail records (CDRs), customer relationship management (CRM) and billing information as a data source to analyze telecom customer behavior, and applied different types of machine learning algorithms to provide marketing campaigns with more accurate information about customer demographic attributes. This model is built using reliable data set of 18,000 users provided by SyriaTel Telecom Company, for training and testing. The model applied by using big data technology and achieved 85.6% accuracy in terms of user gender prediction and 65.5% of user age prediction. The main contribution of this work is the improvement in the accuracy in terms of user gender prediction and user age prediction based on mobile phone data and end-to-end solution that approaches customer data from multiple aspects in the telecom domain.


Inference of a Multi-Domain Machine Learning Model to Predict Mortality in Hospital Stays for Patients with Cancer upon Febrile Neutropenia Onset

arXiv.org Machine Learning

Febrile neutropenia (FN) has been associated with high mortality, especially among adults with cancer. Understanding the patient and provider level heterogeneity in FN hospital admissions has potential to inform personalized interventions focused on increasing survival of individuals with FN. We leverage machine learning techniques to disentangling the complex interactions among multi domain risk factors in a population with FN. Data from the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample and Nationwide Inpatient Sample (NIS) were used to build machine learning based models of mortality for adult cancer patients who were diagnosed with FN during a hospital admission. In particular, the importance of risk factors from different domains (including demographic, clinical, and hospital associated information) was studied. A set of more interpretable (decision tree, logistic regression) as well as more black box (random forest, gradient boosting, neural networks) models were analyzed and compared via multiple cross validation. Our results demonstrate that a linear prediction score of FN mortality among adults with cancer, based on admission information is effective in classifying high risk patients; clinical diagnoses is the domain with the highest predictive power. A number of the risk variables (e.g. sepsis, kidney failure, etc.) identified in this study are clinically actionable and may inform future studies looking at the patients prior medical history are warranted.


Controlling false discoveries in large-scale experimentation: Challenges and solutions

Robohub

"Scientific research has changed the world. Now it needs to change itself. There has been a growing concern about the validity of scientific findings. A multitude of journals, papers and reports have recognized the ever smaller number of replicable scientific studies. In 2016, one of the giants of scientific publishing, Nature, surveyed about 1,500 researchers across many different disciplines, asking for their stand on the status of reproducibility in their area of research. One of the many takeaways to the worrisome results of this survey is the following: 90% of the respondents agreed that there is a reproducibility crisis, and the overall top answer to boosting reproducibility was "better understanding of statistics". Indeed, many factors contributing to the explosion of irreproducible research stem from the neglect of the fact that statistics is no longer as static as it was in the first half of the 20th century, when statistical hypothesis testing came into prominence as a ...


Explaining precision and recall – Andreas Klintberg – Medium

#artificialintelligence

The first days and weeks of getting into NLP, I had a hard time grasping the concepts of precision, recall and F1-score. Accuracy is also a metric which is tied to these, as well as micro-precision and macro-precision. These metrics are important in general machine learning and deep learning as well. However one of my colleagues (Thanks Marci!) explained it in an excellent way in which I'd thought I'd share. Many of you have probably already seen this, but for me it was a revelation in its simplicity.


Classifying textual data: shallow, deep and ensemble methods

arXiv.org Machine Learning

Nowadays the increasing and rapid progress of technology and the availability of electronic documents from a variety of sources have made a huge amount of textual data available. Hence, one of the prominent research topics of statistical andmachine learning communities is to provide suitable and feasible methods to extract high-quality information from unstructured textual data (Lata and Loar, 2018) for the different purposes of clustering, classification and document retrieval (Khan et al., 2010). This work originates from an empirical problem of classification of the content ofcalls made to the customer service of an important mobile phone company inItaly. The received calls are written down by an operator and classified into relevant classes (e.g.