Accuracy
End-to-End Abnormality Detection in Medical Imaging
Wu, Dufan, Kim, Kyungsang, Dong, Bin, Li, Quanzheng
Nearly all of the deep learning based image analysis methods work on reconstructed images, which are obtained from original acquisitions via solving inverse problems. The reconstruction algorithms are designed for human observers, but not necessarily optimized for DNNs. It is desirable to train the DNNs directly from the original data which lie in a different domain with the images. In this work, we proposed an end-to-end DNN for abnormality detection in medical imaging. A DNN was built as the unrolled version of iterative reconstruction algorithms to map the acquisitions to images, and followed by a 3D convolutional neural network (CNN) to detect the abnormality in the reconstructed images. The two networks were trained jointly in order to optimize the entire DNN for the detection task from the original acquisitions. The DNN was implemented for lung nodule detection in low-dose chest CT. The proposed end-to-end DNN demonstrated better sensitivity and accuracy for the task compared to a two-step approach, in which the reconstruction and detection DNNs were trained separately. A significant reduction of false positive rate on suspicious lesions were observed, which is crucial for the known over-diagnosis in low-dose lung CT imaging. The images reconstructed by the proposed end-to-end network also presented enhanced details in the region of interest.
Trimmed Density Ratio Estimation
Liu, Song, Takeda, Akiko, Suzuki, Taiji, Fukumizu, Kenji
Density ratio estimation (DRE) [18, 11, 27] is an important tool in various branches of machine learning and statistics. Due to its ability of directly modelling the differences between two probability density functions, DRE finds its applications in change detection [13, 6], twosample test [32] and outlier detection [1, 26]. In recent years, a sampling framework called Generative Adversarial Network (GAN) (see e.g., [9, 19]) uses the density ratio function to compare artificial samples from a generative distribution and real samples from an unknown distribution. DRE has also been widely discussed in statistical literatures for adjusting nonparametric density estimation [5], stabilizing the estimation of heavy tailed distribution [7] and fitting multiple distributions at once [8]. However, as a density ratio function can grow unbounded, DRE can suffer from robustness and stability issues: a few corrupted points may completely mislead the estimator (see Figure 2 in Section 6 for example).
Artificial intelligence helps detect ovarian cancer early and accurately
Ovarian cancer is difficult to diagnose, particularly in its early stages, when survival rates are much higher. Because there is no consistently reliable screening test to detect ovarian cancer, most women are diagnosed with the disease when it's in an advanced stage. However, researchers at Brigham and Women's Hospital and Dana-Farber Cancer Institute have developed a non-invasive diagnostic test using artificial intelligence for the accurate detection of true cases of early-stage disease. Results of their study were published online this week in the journal eLife. By combining next generation sequencing with artificial intelligence, researchers have created a novel blood test based on serum microRNAs--small, non-coding pieces of genetic material that help control where and when genes are activated--for the early diagnosis of ovarian cancer.
On Fairness and Calibration
Pleiss, Geoff, Raghavan, Manish, Wu, Felix, Kleinberg, Jon, Weinberger, Kilian Q.
The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models. This has motivated a growing line of work on what it means for a classification procedure to be "fair." In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that calibration is compatible only with a single error constraint (i.e. equal false-negatives rates across groups), and show that any algorithm that satisfies this relaxation is no better than randomizing a percentage of predictions for an existing classifier. These unsettling findings, which extend and generalize existing results, are empirically confirmed on several datasets.
New blood test developed to diagnose ovarian cancer
Investigators from Brigham and Women's Hospital and Dana-Farber Cancer Institute are leveraging the power of artificial intelligence to develop a new technique to detect ovarian cancer early and accurately. The team has identified a network of circulating microRNAs - small, non-coding pieces of genetic material - that are associated with risk of ovarian cancer and can be detected from a blood sample. Their findings are published online in eLife. Most women are diagnosed with ovarian cancer when the disease is at an advanced stage, at which point only about a quarter of patients will survive for at least five years. But for women whose cancer is serendipitously picked up at an early stage, survival rates are much higher.
Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction
Bouchard, Kristofer E., Bujan, Alejandro F., Roosta-Khorasani, Farbod, Ubaru, Shashanka, Prabhat, null, Snijders, Antoine M., Mao, Jian-Hua, Chang, Edward F., Mahoney, Michael W., Bhattacharyya, Sharmodeep
The increasing size and complexity of scientific data could dramatically enhance discovery and prediction for basic scientific applications. Realizing this potential, however, requires novel statistical analysis methods that are both interpretable and predictive. We introduce Union of Intersections (UoI), a flexible, modular, and scalable framework for enhanced model selection and estimation. Methods based on UoI perform model selection and model estimation through intersection and union operations, respectively. We show that UoI-based methods achieve low-variance and nearly unbiased estimation of a small number of interpretable features, while maintaining high-quality prediction accuracy. We perform extensive numerical investigation to evaluate a UoI algorithm ($UoI_{Lasso}$) on synthetic and real data. In doing so, we demonstrate the extraction of interpretable functional networks from human electrophysiology recordings as well as accurate prediction of phenotypes from genotype-phenotype data with reduced features. We also show (with the $UoI_{L1Logistic}$ and $UoI_{CUR}$ variants of the basic framework) improved prediction parsimony for classification and matrix factorization on several benchmark biomedical data sets. These results suggest that methods based on the UoI framework could improve interpretation and prediction in data-driven discovery across scientific fields.
Join the disruptors of health science
Thomas Insel left Verily, a health-science spin-off formed by Google's parent company, to co-found a start-up called Mindstrong Health this year. In early 2015, I testified with several other National Institutes of Health (NIH) directors at an annual hearing held by the US Senate. It was my 13th and final year as director of the US National Institute of Mental Health (NIMH) in Bethesda, Maryland. What struck me most was how the harsh fiscal reality tempered the passionate bipartisan support for the NIH. As one senator noted, with a federal deficit of nearly US$500 billion, there was little hope of any significant increase in funding. Six months after that hearing, I left the NIH for Silicon Valley, first working at Verily in South San Francisco, California, a health-science spin-off formed by Google's parent company Alphabet.
Calibration for Stratified Classification Models
In classification problems, sampling bias between training data and testing data is critical to the ranking performance of classification scores. Such bias can be both unintentionally introduced by data collection and intentionally introduced by the algorithm, such as under-sampling or weighting techniques applied to imbalanced data. When such sampling bias exists, using the raw classification score to rank observations in the testing data can lead to suboptimal results. In this paper, I investigate the optimal calibration strategy in general settings, and develop a practical solution for one specific sampling bias case, where the sampling bias is introduced by stratified sampling. The optimal solution is developed by analytically solving the problem of optimizing the ROC curve. For practical data, I propose a ranking algorithm for general classification models with stratified data. Numerical experiments demonstrate that the proposed algorithm effectively addresses the stratified sampling bias issue. Interestingly, the proposed method shows its potential applicability in two other machine learning areas: unsupervised learning and model ensembling, which can be future research topics.
Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models
Aoshima, Makoto, Yata, Kazuyoshi
We consider classifiers for high-dimensional data under the strongly spiked eigenvalue (SSE) model. We first show that high-dimensional data often have the SSE model. We consider a distance-based classifier using eigenstructures for the SSE model. We apply the noise reduction methodology to estimation of the eigenvalues and eigenvectors in the SSE model. We create a new distance-based classifier by transforming data from the SSE model to the non-SSE model. We give simulation studies and discuss the performance of the new classifier. Finally, we demonstrate the new classifier by using microarray data sets.