Performance Analysis
Estimating sex and age for forensic applications using machine learning based on facial measurements from frontal cephalometric landmarks
Porto, Lucas F., Lima, Laise N. Correia, Franco, Ademir, Pianto, Donald M., Palhares, Carlos Eduardo Machado, Pianto, Donald M., Vidal, Flavio de Barros
Facial analysis permits many investigations some of the most important of which are craniofacial identification, facial recognition, and age and sex estimation. In forensics, photo-anthropometry describes the study of facial growth and allows the identification of patterns in facial skull development by using a group of cephalometric landmarks to estimate anthropological information. In several areas, automation of manual procedures has achieved advantages over and similar measurement confidence as a forensic expert. This manuscript presents an approach using photo-anthropometric indexes, generated from frontal faces cephalometric landmarks, to create an artificial neural network classifier that allows the estimation of anthropological information, in this specific case age and sex. The work is focused on four tasks: i) sex estimation over ages from 5 to 22 years old, evaluating the interference of age on sex estimation; ii) age estimation from photo-anthropometric indexes for four age intervals (1 year, 2 years, 4 years and 5 years); iii) age group estimation for thresholds of over 14 and over 18 years old; and; iv) the provision of a new data set, available for academic purposes only, with a large and complete set of facial photo-anthropometric points marked and checked by forensic experts, measured from over 18,000 faces of individuals from Brazil over the last 4 years. The proposed classifier obtained significant results, using this new data set, for the sex estimation of individuals over 14 years old, achieving accuracy values greater than 0.85 by the F_1 measure. For age estimation, the accuracy results are 0.72 for measure with an age interval of 5 years. For the age group estimation, the measures of accuracy are greater than 0.93 and 0.83 for thresholds of 14 and 18 years, respectively.
Modelling Segmented Cardiotocography Time-Series Signals Using One-Dimensional Convolutional Neural Networks for the Early Detection of Abnormal Birth Outcomes
Fergus, Paul, Chalmers, Carl, Montanez, Casimiro Curbelo, Reilly, Denis, Lisboa, Paulo, Pineles, Beth
Gynaecologists and obstetricians visually interpret cardiotocography (CTG) traces using the International Federation of Gynaecology and Obstetrics (FIGO) guidelines to assess the wellbeing of the foetus during antenatal care. This approach has raised concerns among professionals concerning inter- and intra-variability where clinical diagnosis only has a 30% positive predictive value when classifying pathological outcomes. Machine learning models, trained with FIGO and other user derived features extracted from CTG traces, have been shown to increase positive predictive capacity and minimise variability. This is only possible however when class distributions are equal which is rarely the case in clinical trials where case-control observations are heavily skewed. Classes can be balanced using either synthetic data derived from resampled case training data or by decreasing the number of control instances. However, this introduces bias and removes valuable information. Concerns have also been raised regarding machine learning studies and their reliance on manually handcrafted features. While this has led to some interesting results, deriving an optimal set of features is considered to be an art as well as a science and is often an empirical and time consuming process. In this paper, we address both of these issues and propose a novel CTG analysis methodology that a) splits CTG time series signals into n-size windows with equal class distributions, and b) automatically extracts features from time-series windows using a one dimensional convolutional neural network (1DCNN) and multilayer perceptron (MLP) ensemble. Our proposed method achieved good results using a window size of 200 with (Sens=0.7981, Spec=0.7881, F1=0.7830, Kappa=0.5849, AUC=0.8599, and Logloss=0.4791).
Predicted disease compositions of human gliomas estimated from multiparametric MRI can predict endothelial proliferation, tumor grade, and overall survival
Diller, Emily E, Cao, Sha, Ey, Beth, Lober, Robert, Parker, Jason G
Background and Purpose: Biopsy is the main determinants of glioma clinical management, but require invasive sampling that fail to detect relevant features because of tumor heterogeneity. The purpose of this study was to evaluate the accuracy of a voxel-wise, multiparametric MRI radiomic method to predict features and develop a minimally invasive method to objectively assess neoplasms. Methods: Multiparametric MRI were registered to T1-weighted gadolinium contrast-enhanced data using a 12 degree-of-freedom affine model. The retrospectively collected MRI data included T1-weighted, T1-weighted gadolinium contrast-enhanced, T2-weighted, fluid attenuated inversion recovery, and multi-b-value diffusion-weighted acquired at 1.5T or 3.0T. Clinical experts provided voxel-wise annotations for five disease states on a subset of patients to establish a training feature vector of 611,930 observations. Then, a k-nearest-neighbor (k-NN) classifier was trained using a 25% hold-out design. The trained k-NN model was applied to 13,018,171 observations from seventeen histologically confirmed glioma patients. Linear regression tested overall survival (OS) relationship to predicted disease compositions (PDC) and diagnostic age (alpha = 0.05). Canonical discriminant analysis tested if PDC and diagnostic age could differentiate clinical, genetic, and microscopic factors (alpha = 0.05). Results: The model predicted voxel annotation class with a Dice similarity coefficient of 94.34% +/- 2.98. Linear combinations of PDCs and diagnostic age predicted OS (p = 0.008), grade (p = 0.014), and endothelia proliferation (p = 0.003); but fell short predicting gene mutations for TP53BP1 and IDH1. Conclusions: This voxel-wise, multi-parametric MRI radiomic strategy holds potential as a non-invasive decision-making aid for clinicians managing patients with glioma.
Biased algorithms: here's a more radical approach to creating fairness
Our lives are increasingly affected by algorithms. People may be denied loans, jobs, insurance policies, or even parole on the basis of risk scores that they produce. Yet algorithms are notoriously prone to biases. For example, algorithms used to assess the risk of criminal recidivism often have higher error rates in minority ethic groups. As ProPublica found, the COMPAS algorithm – widely used to predict re-offending in the US criminal justice system – had a higher false positive rate in black than in white people; black people were more likely to be wrongly predicted to re-offend.
Top 10 Machine Learning Interview Questions 2019 - DZone AI
Emerging technologies have taken the world by storm. The innovations, opportunities, and threats they have unleashed are like no other. Along with their growth, the demand for specialists in these areas has grown. A career in emerging technologies such as machine learning, AI, or data science can be highly lucrative as well as intellectually stimulating. In this article, I have compiled some of the most frequently asked machine learning interview questions with their corresponding answers.
Local Trend Inconsistency: A Prediction-driven Approach to Unsupervised Anomaly Detection in Multi-seasonal Time Series
Wu, Wentai, He, Ligang, Lin, Weiwei
Abstract--Online detection of anomalies in time series is a key technique in various event-sensitive scenarios such a s robotic system monitoring, smart sensor networks and data center security. However, the increasing diversity of data sources and demands are making this task more challenging than ever . First, the rapid increase of unlabeled data makes supervise d learning no longer suitable in many cases. Second, a great po rtion of time series have complex seasonality features. Third, on -line anomaly detection needs to be fast and reliable. In view of this, we in this paper adopt an unsupervised prediction-dri ven approach on the basis of a backbone model combining a series decomposition part and an inference part. We then propose a novel metric, Local Trend Inconsistency (L TI), along with a detection algorithm that efficiently computes L TI chronolo gically along the series and marks each data point with a score indica ting its probability of being anomalous. The result shows that our scheme outperforms several representative anomaly detection alg orithms in Area Under Curve (AUC) metric with decent time efficiency. While time series data has been ubiquitous before the coming of big data era, a large number of recently emerging technical scenarios like autonomous driving, edge computi ng and Internet of Things (IoT) pose new challenges to the detection of anomalies in this type of data. In the meantime, detection techniques that can provide early, reliable repo rts of anomaly has become crucial for a wide range of systems requiring 24/7 monitoring services. In cloud data centers, for example, a distributed monitoring system usually collects a variety of log data from virtual machine level to cluster lev el on a regular basis and sends them to a central detection module, which needs to analyze the aggregated time series to detect any anomalous events including hardware breakdown, unavailable services and cyber attacks. This requires an on - line detector capable of making reliable detections (i.e., with strong sensitivity and specificity), otherwise it could bri ng about unnecessary cost of maintenance.
Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting
Lopes, Miles E., Wu, Suofei, Lee, Thomas C. M.
When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem in the context of regression --- which complements our companion paper in the context of classification (Lopes 2019). In contrast to the classification setting, the current paper shows that theoretical guarantees for the proposed bootstrap can be established under much weaker assumptions. In addition, we illustrate the flexibility of the method by showing how it can be adapted to measure algorithmic convergence for variable selection. Lastly, we provide numerical results demonstrating that the method works well in a range of situations.
Toward Understanding Catastrophic Forgetting in Continual Learning
Nguyen, Cuong V., Achille, Alessandro, Lam, Michael, Hassner, Tal, Mahadevan, Vijay, Soatto, Stefano
We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlation analysis to specify and analyze the properties we are interested in. As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity. We show that error rates are strongly and positively correlated to a task sequence's total complexity for some state-of-the-art algorithms. We also show that, surprisingly, the error rates have no or even negative correlations in some cases to sequential heterogeneity. Our findings suggest directions for improving continual learning benchmarks and methods.
Mixed-Integer Optimization Approach to Learning Association Rules for Unplanned ICU Transfer
Chou, Chun-An, Cao, Qingtao, Weng, Shao-Jen, Tsai, Che-Hung
After admission to emergency department (ED), patients with critical illnesses are transferred to intensive care unit (ICU) due to unexpected clinical deterioration occurrence. Identifying such unplanned ICU transfers is urgently needed for medical physicians to achieve two-fold goals: improving critical care quality and preventing mortality. A priority task is to understand the crucial rationale behind diagnosis results of individual patients during stay in ED, which helps prepare for an early transfer to ICU. Most existing prediction studies were based on univariate analysis or multiple logistic regression to provide one-size-fit-all results. However, patient condition varying from case to case may not be accurately examined by the only judgment. In this study, we present a new decision tool using a mathematical optimization approach aiming to automatically discover rules associating diagnostic features with high-risk outcome (i.e., unplanned transfers) in different deterioration scenarios. We consider four mutually exclusive patient subgroups based on the principal reasons of ED visits: infections, cardiovascular/respiratory diseases, gastrointestinal diseases, and neurological/other diseases at a suburban teaching hospital. The analysis results demonstrate significant rules associated with unplanned transfer outcome for each subgroups and also show comparable prediction accuracy, compared to state-of-the-art machine learning methods while providing easy-to-interpret symptom-outcome information.
MarmoNet: a pipeline for automated projection mapping of the common marmoset brain from whole-brain serial two-photon tomography
Skibbe, Henrik, Watakabe, Akiya, Nakae, Ken, Gutierrez, Carlos Enrique, Tsukada, Hiromichi, Hata, Junichi, Kawase, Takashi, Gong, Rui, Woodward, Alexander, Doya, Kenji, Okano, Hideyuki, Yamamori, Tetsuo, Ishii, Shin
Understanding the connectivity in the brain is an important prerequisite for understanding how the brain processes information. In the Brain/MINDS project, a connectivity study on marmoset brains uses two-photon microscopy fluorescence images of axonal projections to collect the neuron connectivity from defined brain regions at the mesoscopic scale. The processing of the images requires the detection and segmentation of the axonal tracer signal. The objective is to detect as much tracer signal as possible while not misclassifying other background structures as the signal. This can be challenging because of imaging noise, a cluttered image background, distortions or varying image contrast cause problems. We are developing MarmoNet, a pipeline that processes and analyzes tracer image data of the common marmoset brain. The pipeline incorporates state-of-the-art machine learning techniques based on artificial convolutional neural networks (CNN) and image registration techniques to extract and map all relevant information in a robust manner. The pipeline processes new images in a fully automated way. This report introduces the current state of the tracer signal analysis part of the pipeline.