Accuracy
VizML: A Machine Learning Approach to Visualization Recommendation
Hu, Kevin Z., Bakker, Michiel A., Li, Stephen, Kraska, Tim, Hidalgo, César A.
Data visualization should be accessible for all analysts with data, not just the few with technical expertise. Visualization recommender systems aim to lower the barrier to exploring basic visualizations by automatically generating results for analysts to search and select, rather than manually specify. Here, we demonstrate a novel machine learning-based approach to visualization recommendation that learns visualization design choices from a large corpus of datasets and associated visualizations. First, we identify five key design choices made by analysts while creating visualizations, such as selecting a visualization type and choosing to encode a column along the X- or Y-axis. We train models to predict these design choices using one million dataset-visualization pairs collected from a popular online visualization platform. Neural networks predict these design choices with high accuracy compared to baseline models. We report and interpret feature importances from one of these baseline models. To evaluate the generalizability and uncertainty of our approach, we benchmark with a crowdsourced test set, and show that the performance of our model is comparable to human performance when predicting consensus visualization type, and exceeds that of other ML-based systems.
Predicting Acute Kidney Injury at Hospital Re-entry Using High-dimensional Electronic Health Record Data
Weisenthal, Samuel J., Quill, Caroline, Farooq, Samir, Kautz, Henry, Zand, Martin S.
Acute Kidney Injury (AKI), a sudden decline in kidney function, is associated with increased mortality, morbidity, length of stay, and hospital cost. Since AKI is sometimes preventable, there is great interest in prediction. Most existing studies consider all patients and therefore restrict to features available in the first hours of hospitalization. Here, the focus is instead on rehospitalized patients, a cohort in which rich longitudinal features from prior hospitalizations can be analyzed. Our objective is to provide a risk score directly at hospital re-entry. Gradient boosting, penalized logistic regression (with and without stability selection), and a recurrent neural network are trained on two years of adult inpatient EHR data (3,387 attributes for 34,505 patients who generated 90,013 training samples with 5,618 cases and 84,395 controls). Predictions are internally evaluated with 50 iterations of 5-fold grouped cross-validation with special emphasis on calibration, an analysis of which is performed at the patient as well as hospitalization level. Error is assessed with respect to diagnosis, race, age, gender, AKI identification method, and hospital utilization. In an additional experiment, the regularization penalty is severely increased to induce parsimony and interpretability. Predictors identified for rehospitalized patients are also reported with a special analysis of medications that might be modifiable risk factors. Insights from this study might be used to construct a predictive tool for AKI in rehospitalized patients. An accurate estimate of AKI risk at hospital entry might serve as a prior for an admitting provider or another predictive algorithm.
Optimal flow analysis, prediction and application
This thesis employs statistical learning technique to analyze, predict and solve the fixed charge network flow (FCNF) problem, which is common encountered in many real-world network problems. The cost structure for flows in the FCNF involves both fixed and variable costs. The FCNF problem is modeled mixed binary linear programs and can be solved with standard commercial solvers, which use branch and bound algorithm. This problem is important for its widely applications and solving challenges. There does not exist a efficient algorithm to solve this problem optimally due to lacking tight bounds. To the best of our knowledge, this is the first work that employs statistical learning technique to analyze the optimal flow of the FCNF problem. Most algorithms developed to solve the FCNF problem are based on the cost structure, relaxation, etc. We start from the network characteristics and explore the relationship between properties of nodes, arcs and networks and the optimal flow. This is a bi-direction approach and the findings can be used to locate the features that affect the optimal flow most significantly, predict the optimal arcs and provide information to solve the FCNF problem. In particular, we define 33 features based on the network characteristics, from which using step wise regression, we identify 26 statistical significant predictors for logistic regression to predict which arcs will have positive flow in the optimal solutions. The predictive model achieves 88% accuracy and the area under receiver operating characteristic curve is 0.95. Two applications are investigated. Firstly, the predictive results can be used directly as component critical index. The failure of arcs with higher critical index result in more cost increase over the entire network.
America's top maker of cop body cameras says facial-recog AI isn't safe
Analysis America's largest manufacturer of body cameras – and the biggest supplier to police forces across the United States – says today's facial recognition technology is not safe for making serious decisions. Speaking during its second-quarter earnings call with investors this week, the CEO of Axon, Rick Smith, answered a question about whether the company would be adding facial-recognition systems to its suite of products and, if so, whether that would come with an additional cost. Smith responded in clear terms that current facial recognition is simply not accurate enough to "make operational decisions," ie: for police to use it to recognize individuals and use positive responses as justification for automatically and unquestioningly apprehending people. Well, the computer says you're wanted, so here come the cuffs, we can imagine a conversation with officers going. "We don't have a timeline to launch facial recognition," Smith said on the conference call (listen in at around the 40-minute mark), noting that Axon doesn't have a team "actively developing it" either.
Using Randomness to Improve Robustness of Machine-Learning Models Against Evasion Attacks
Machine learning models have been widely used in security applications such as intrusion detection, spam filtering, and virus or malware detection. However, it is well-known that adversaries are always trying to adapt their attacks to evade detection. For example, an email spammer may guess what features spam detection models use and modify or remove those features to avoid detection. There has been some work on making machine learning models more robust to such attacks. However, one simple but promising approach called {\em randomization} is underexplored. This paper proposes a novel randomization-based approach to improve robustness of machine learning models against evasion attacks. The proposed approach incorporates randomization into both model training time and model application time (meaning when the model is used to detect attacks). We also apply this approach to random forest, an existing ML method which already has some degree of randomness. Experiments on intrusion detection and spam filtering data show that our approach further improves robustness of random-forest method. We also discuss how this approach can be applied to other ML models.
Image Registration and Predictive Modeling: Learning the Metric on the Space of Diffeomorphisms
Mussabayeva, Ayagoz, Kroshnin, Alexey, Kurmukov, Anvar, Dodonova, Yulia, Shen, Li, Cong, Shan, Wang, Lei, Gutman, Boris A.
We present a method for metric optimization in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, by treating the induced Riemannian metric on the space of diffeomorphisms as a kernel in a machine learning context. For simplicity, we choose the kernel Fischer Linear Discriminant Analysis (KLDA) as the framework. Optimizing the kernel parameters in an Expectation-Maximization framework, we define model fidelity via the hinge loss of the decision function. The resulting algorithm optimizes the parameters of the LDDMM norm-inducing differential operator as a solution to a group-wise registration and classification problem. In practice, this may lead to a biology-aware registration, focusing its attention on the predictive task at hand such as identifying the effects of disease. We first tested our algorithm on a synthetic dataset, showing that our parameter selection improves registration quality and classification accuracy. We then tested the algorithm on 3D subcortical shapes from the Schizophrenia cohort Schizconnect. Our Schizpohrenia-Control predictive model showed significant improvement in ROC AUC compared to baseline parameters.
OBOE: Collaborative Filtering for AutoML Initialization
Yang, Chengrun, Akimoto, Yuji, Kim, Dae Won, Udell, Madeleine
Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. The number of machine learning applications is growing much faster than the number of machine learning experts, hence we see an increasing demand for efficient automation of learning processes. Here, we introduce OBOE, an algorithm for time-constrained model selection and hyperparameter tuning. Taking advantage of similarity between datasets, OBOE finds promising algorithm and hyperparameter configurations through collaborative filtering. Our system explores these models under time constraints, so that rapid initializations can be provided to warm-start more fine-grained optimization methods. One novel aspect of our approach is a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems.
Starting Movement Detection of Cyclists Using Smart Devices
Bieshaar, Maarten, Depping, Malte, Schneegans, Jan, Sick, Bernhard
Abstract--In near future, vulnerable road users (VRUs) such as cyclists and pedestrians will be equipped with smart devices and wearables which are capable to communicate with intelligent vehicles and other traffic participants. Road users are then able to cooperate on different levels, such as in cooperative intention detection for advanced VRU protection. Smart devices can be used to detect intentions, e.g., an occluded cyclist intending to cross the road, to warn vehicles of VRUs, and prevent potential collisions. This article presents a human activity recognition approach to detect the starting movement of cyclists wearing smart devices. We propose a novel two-stage feature selection procedure using a score specialized for robust starting detection reducing the false positive detections and leading to understandable and interpretable features. The detection is modelled as a classification problem and realized by means of a machine learning classifier. We introduce an auxiliary class, that models starting movements and allows to integrate early movement indicators, i.e., body part movements indicating future behaviour. In this way we improve the robustness and reduce the detection time of the classifier. Our empirical studies with real-world data originating from experiments which involve 49 test subjects and consists of 84 starting motions show that we are able to detect the starting movements early. Investigations concerning the device wearing location show that for devices worn in the trouser pocket the detector has less false detections and detects starting movements faster on average. We found that we can further improve the results when we train distinct classifiers for different wearing locations. In our work, we envision future mixed traffic scenarios where automated cars, trucks, sensor-equipped infrastructure, and other road users equipped with smart devices or other wearables are interconnected by means of ad hoc networks. This allows the traffic participants to cooperate, i.e., determine and maintain local models of the surrounding traffic situations. Vulnerable road users (VRUs) will still play an important role in future urban traffic.
Facial recognition tech to be used on Olympians and staff at Tokyo 2020
Automated facial recognition systems from Japanese biz NEC will be used on staffers and athletes at the Tokyo 2020 Olympics. The technology – which is not without its detractors in the UK – was demonstrated at a media event in the city today. It will require athletes, staff, volunteers and the press to submit their photographs before the games start. These will then be linked up to IC chips in their passes and combined with scanners on entry to allow them access to more than 40 facilities. Tsuyoshi Iwashita, head of security for the games, said the aim was to reduce pressure on entry points and shorten queueing time for this group of people.
Inferring Molecular Pathology and micro-RNA Transcriptome from mRNA Profiles of Cancer Biopsies through Deep Multi-Task Learning
Azarkhalili, Behrooz, Saberi, Ali, Chitsaz, Hamidreza, Sharifi-Zarchi, Ali
Despite great advances, molecular cancer pathology is often limited to use a small number of biomarkers rather than the whole transcriptome, partly due to the computational challenges. Here, we introduce a novel architecture of DNNs that is capable of simultaneous inference of various properties of biological samples, through multi-task and transfer learning. We employed this architecture on mRNA transcription profiles of 10787 clinical samples from 34 classes (one healthy and 33 different types of cancer) from 27 tissues. Our system significantly outperforms prior works and classical machine learning approaches in predicting tissue-of-origin, normal or disease state and cancer type of each sample. Furthermore, it can predict miRNA transcription profile of each sample, which enables performing miRNA expression research when only mRNA transcriptome data are available. We also show this system is very robust against noise and missing values. Collectively, our results highlight applications of artificial intelligence in molecular cancer pathology and oncological research.