Accuracy
InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity
Ryu, Hee Jung, Adam, Hartwig, Mitchell, Margaret
We demonstrate an approach to face attribute detection that retains or improves attribute detection accuracy across gender and race subgroups by learning demographic information prior to learning the attribute detection task. The system, which we call InclusiveFaceNet, detects face attributes by transferring race and gender representations learned from a held-out dataset of public race and gender identities. Leveraging learned demographic representations while withholding demographic inference from the downstream face attribute detection task preserves potential users' demographic privacy while resulting in some of the best reported numbers to date on attribute detection in the Faces of the World and CelebA datasets.
Debunking Google's Death AI - Predictive Analytics Times - machine learning & data science news
Editor's note: Although this author absolves the researchers (from Google) and blames only the journalists for the widespread false claims of a 95% accuracy level for mortality prediction, note that the research paper itself does indeed use the word "accuracy" multiple times as a synonym of AUROC, thus "starting it" among the non-technical or less technical journalists at large. Having my newsfeed cluttered with articles about Google creating an AI that beats hospitals by predicting death with 95% accuracy (or some other erroneous claim), I dug up the original research paper to fact check this wondrous new advancement. Many of said articles used this quote from the abstract (academia's equivalent of a paperback blurb): These models outperformed traditional, clinically used predictive models in all cases. We believe that this approach can be used to create accurate and scaleable predictions for a variety of clinical scenarios. To the best of our knowledge, our models outperform existing EHR (Electronic Health Record) models in the medical literature.
Data Infrastructure and Approaches for Ontology-Based Drug Repurposing
Boyer, Stephen, Griffin, Thomas, Swaminathan, Sarath, Clarkson, Kenneth L., Zubarev, Dmitry
IBM Almaden Research Center, 650 Harry Road, San Jose, California 95136 Abstract We report development of a data infrastructure for drug repurposing that takes advantage of two currently available chemical ontologies. The data infrastructure includes a database of compoundtarget associations augmented with molecular ontological labels. It also contains two computational tools for prediction of new associations. We describe two drug-repurposing systems: one, Nascent Ontological Information Retrieval for Drug Repurposing (NOIR-DR), based on an information retrieval strategy, and another, based on nonnegative matrix factorization together with compound similarity, that was inspired by recommender systems. We report the performance of both tools on a drug-repurposing task. 1 Introduction Drug repurposing is an efficient strategy for drug discovery, where new targets or activities are found for known drugs [1-5]. Drug repurposing requires the efficient representation of existing information about the activity of chemical compounds as drugs, and the development of algorithms that leverage such information and propose new indications.
LiDAR and Camera Detection Fusion in a Real Time Industrial Multi-Sensor Collision Avoidance System
Wei, Pan, Cagle, Lucas, Reza, Tasmia, Ball, John, Gafford, James
Collision avoidance is a critical task in many applications, such as ADAS (advanced driver-assistance systems), industrial automation and robotics. In an industrial automation setting, certain areas should be off limits to an automated vehicle for protection of people and high-valued assets. These areas can be quarantined by mapping (e.g., GPS) or via beacons that delineate a no-entry area. We propose a delineation method where the industrial vehicle utilizes a LiDAR {(Light Detection and Ranging)} and a single color camera to detect passive beacons and model-predictive control to stop the vehicle from entering a restricted space. The beacons are standard orange traffic cones with a highly reflective vertical pole attached. The LiDAR can readily detect these beacons, but suffers from false positives due to other reflective surfaces such as worker safety vests. Herein, we put forth a method for reducing false positive detection from the LiDAR by projecting the beacons in the camera imagery via a deep learning method and validating the detection using a neural network-learned projection from the camera to the LiDAR space. Experimental data collected at Mississippi State University's Center for Advanced Vehicular Systems (CAVS) shows the effectiveness of the proposed system in keeping the true detection while mitigating false positives.
Automated Vulnerability Detection in Source Code Using Deep Representation Learning
Russell, Rebecca L., Kim, Louis, Hamilton, Lei H., Lazovich, Tomo, Harer, Jacob A., Ozdemir, Onur, Ellingwood, Paul M., McConley, Marc W.
Increasing numbers of software vulnerabilities are discovered every year whether they are reported publicly or discovered internally in proprietary code. These vulnerabilities can pose serious risk of exploit and result in system compromise, information leaks, or denial of service. We leveraged the wealth of C and C++ open-source code available to develop a large-scale function-level vulnerability detection system using machine learning. To supplement existing labeled vulnerability datasets, we compiled a vast dataset of millions of open-source functions and labeled it with carefully-selected findings from three different static analyzers that indicate potential exploits. Using these datasets, we developed a fast and scalable vulnerability detection tool based on deep feature representation learning that directly interprets lexed source code. We evaluated our tool on code from both real software packages and the NIST SATE IV benchmark dataset. Our results demonstrate that deep feature representation learning on source code is a promising approach for automated software vulnerability detection.
chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models
Hughes-Oliver, Jeremy R. Ash Jacqueline M.
It is now commonplace for researchers across a variety of fields to fit machine learning models on complex data to make predictions. The complexity of these data (e.g., large number of features, nonlinear relationships with the response) often means it is difficult to determine a priori what machine learning modeling routine and what descriptors (also known as features, predictors, or covariates) will result in the best performance. A common approach to this problem is to fit many descriptor set and modeling routine (DM) combinations, and then compute measures of prediction performance for held out data to choose a DM combination by assessing relative performance. Often in a particular domain, there are only a few modeling routines that are widely accepted, and researchers tend to use these methods exclusively. Unfortunately, this will not always work well for every data set and researchers might learn from other fields where different modeling methods tend to be more succesful. There are a myraid of modeling methods implemented in R that may be worthwhile for researchers to try (see Hastie et al. (2009) and Kuhn and Johnson (2013) for an overview of these methods). However, the lack of knowledge of the syntactic minutiae and statistical methodology that is required to fit and compare different modeling routines in R often prohibits users from attempting them.
A New Variational Model for Binary Classification in the Supervised Learning Context
Pacheco, Carlos David Brito, Loeza, Carlos Francisco Brito
We examine the supervised learning problem in its continuous setting and give a general optimality condition through techniques of functional analysis and the calculus of variations. This enables us to solve the optimality condition for the desired function u numerically and make several comparisons with other widely utilized supervised learning models. We employ the accuracy and area under the receiver operating characteristic curve as metrics of the performance. Finally, 3 analyses are conducted based on these two mentioned metrics where we compare the models and make conclusions to determine whether or not our method is competitive.
The IPhone's Face ID Struggles in the Morning
Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society. Unlike Beyoncé, we do not all wake up flawless--at least not according to the iPhone X. Several iPhone X–owning Twitter users have taken to the latter (probably using the former) to complain that Face ID--the phone's facial recognition technology--fails to recognize their face first thing in the morning. Like a drunken one-night stand, the iPhone X doesn't quite know who they are in the morning light. Face ID, Apple's follow-up to Touch ID, allows users to unlock their phone with their face--or more specifically, with a mathematical representation of their facial structure.
US taxmen want an AI to do the security checks it seemingly can't do itself
The US tax authority – the Internal Revenue Service – is looking at how AI can secure and protect taxpayers' data held on its servers. It recently filed a request for information aimed at experts that can help guide the IRS into possibly developing a platform that uses machine learning to sniff out and react to threats. The cybersecurity division working for the Cybersecurity Cloud Solution Program is hoping that the information will help them identify potential solutions based on current capabilities. "The Internal Revenue Service's (IRS) Cybersecurity Division has a business need for an Artificial Intelligent (AI) machined-based analytical platform to proactively detect and respond to cyber- and insider-related threats," it said in the request. "The IRS intends to use the results of this RFI to assist in the assessment of on-going industry efforts within the identified focus areas. The finding will also help to shape the path forward for potential acquisitions to include determination of contractual mechanisms to potentially pursue capabilities."
A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
Lee, Kimin, Lee, Kibok, Lee, Honglak, Shin, Jinwoo
Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement to deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditionalGaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in extreme cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.