Radiologists getting an assist from artificial intelligence can detect more breast cancer--with a reduced rate of false positive incidents--from mammography images. A new study, published late last week in the Lancet Digital Health online journal, contends that AI can boost the accuracy of diagnosis by radiologists, compared with the results they achieve by just examining images from mammography exams. The study was conducted by Korean academic hospitals and Lunit, a Seoul-based medical AI company working in radiology and oncology. It draws on large-scale data of more than 170,000 mammogram examinations from five healthcare organizations in South Korea, the U.S. and the U.K. The set of data includes more than 36,000 cases found positive for cancer and verified by biopsies. That data trained the AI models, and the sensitivity of the model was compared with how radiologists perform without any technological assistance with diagnosis.
Fairness is a highly subjective concept and is not different when comes to machine learning. We typically feels that the referees are "unfair" to our favorite team when they lose a close match or that any outcome is extremely "fair" when it goes our way. Given that machine learning models cannot rely on subjectivity, we need an efficient way to quantify fairness. A lot of research has been done in this area mostly framing fairness as an outcome optimization problem. Recently, Google AI research open sourced the Tensor Flow Constrained Optimization Library(TFCO), an optimization framework that can be used for optimizing different objectives of a machine learning model including fairness.
SAS Scripting Wrapper for Analytics Transfer (SWAT), a powerful Python interface, enables you to integrate your Python code with SAS Cloud Analytic Services (CAS). Using SWAT, you can execute CAS analytic actions, including feature engineering, machine learning modeling, and model testing, and then analyze the results locally. This article demonstrates how you can predict the survival rates of Titanic passengers with a combination of both Python and CAS using SWAT. You can then see how well the models performed with some visual statistics. After you install and configure these resources, start a Jupyter Notebook session to get started!
Google AI today released TensorFlow Constrained Optimization (TFCO), a supervised machine learning library built for training machine learning models on multiple metrics and "optimizing inequality-constrained problems." The library is designed to help address issues like fairness constraints and predictive parity and help machine learning practitioners better understand things like true positive rates on residents of certain countries, or recall illness diagnoses depending on age and gender. In tests with a Wikipedia data set, the library achieved lower false-positive rates when predicting whether a comment on a Wiki is toxic based on race, religion, gender identity, or sexuality, while maintaining similar accuracy rates. TFCO is made to "take into account the societal and cultural factors necessary to satisfy real-world requirements," said Andrew Zaldivar on behalf of the TFCO team today in a Google AI blog post. "The ability to express many fairness goals as rate constraints can help drive progress in the responsible development of machine learning, but it also requires developers to carefully consider the problem they are trying to address," he said.
Have you ever wondered how to demonstrate that one machine learning model's test set performance differs significantly from the test set performance of an alternative model? This post will describe how to use DeLong's test to obtain a p-value for whether one model has a significantly different AUC than another model, where AUC refers to the area under the receiver operating characteristic. This post includes a hand-calculated example to illustrate all the steps in DeLong's test for a small data set. It also includes an example R implementation of DeLong's test to enable efficient calculation on large data sets. An example use case for DeLong's test: Model A predicts heart disease risk with AUC of 0.92, and Model B predicts heart disease risk with AUC of 0.87, and we use DeLong's test to demonstrate that Model A has a significantly different AUC from Model B with p 0.05.
We present a new analysis for the combination of binary classifiers. We propose a theoretical framework based on the Neyman-Pearson lemma to analyze combinations of classifiers. In particular, we give a method for finding the optimal decision rule for a combination of classifiers and prove that it has the optimal ROC curve. We also show how our method generalizes and improves on previous work on combining classifiers and generating ROC curves. Papers published at the Neural Information Processing Systems Conference.
In this paper we introduce a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search. Our algorithm differs from previous approaches to learning from search, such as Samuels checkers player and the TD-Leaf algorithm, in two key ways. First, we update all nodes in the search tree, rather than a single node. Second, we use the outcome of a deep search, instead of the outcome of a subsequent search, as the training signal for the evaluation function. We implemented our algorithm in a chess program Meep, using a linear heuristic function.
We propose using correlated bigram LSA for unsupervised LM adaptation for automatic speech recognition. The model is trained using efficient variational EM and smoothed using the proposed fractional Kneser-Ney smoothing which handles fractional counts. Our approach can be scalable to large training corpora via bootstrapping of bigram LSA from unigram LSA. For LM adaptation, unigram and bigram LSA are integrated into the background N-gram LM via marginal adaptation and linear interpolation respectively. Experimental results show that applying unigram and bigram LSA together yields 6%--8% relative perplexity reduction and 0.6% absolute character error rates (CER) reduction compared to applying only unigram LSA on the Mandarin RT04 test set.
In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem.
This paper is devoted to thoroughly investigating how to bootstrap the ROC curve, a widely used visual tool for evaluating the accuracy of test/scoring statistics in the bipartite setup. The issue of confidence bands for the ROC curve is considered and a resampling procedure based on a smooth version of the empirical distribution called the smoothed bootstrap" is introduced. Theoretical arguments and simulation results are presented to show that the "smoothed bootstrap" is preferable to a "naive" bootstrap in order to construct accurate confidence bands." Papers published at the Neural Information Processing Systems Conference.