Goto

Collaborating Authors

 Performance Analysis


LSCP: Locally Selective Combination in Parallel Outlier Ensembles

arXiv.org Machine Learning

In unsupervised outlier ensembles, the absence of ground truth makes the combination of base detectors a challenging task. Specifically, existing parallel outlier ensembles lack a reliable way of selecting competent base detectors, affecting accuracy and stability, during model combination. In this paper, we propose a framework---called Locally Selective Combination in Parallel Outlier Ensembles (LSCP)---which addresses this issue by defining a local region around a test instance using the consensus of its nearest neighbors in randomly generated feature spaces. The top-performing base detectors in this local region are selected and combined as the model's final output. Four variants of the LSCP framework are compared with six widely used combination algorithms for parallel ensembles. Experimental results demonstrate that one of these LSCP variants consistently outperforms baseline algorithms on the majority of eighteen real-world datasets.


Bad practices in evaluation methodology relevant to class-imbalanced problems

arXiv.org Machine Learning

For research to go in the right direction, it is essential to be able to compare and quantify performance of different algorithms focused on the same problem. Choosing a suitable evaluation metric requires deep understanding of the pursued task along with all of its characteristics. We argue that in the case of applied machine learning, proper evaluation metric is the basic building block that should be in the spotlight and put under thorough examination. Here, we address tasks with class imbalance, in which the class of interest is the one with much lower number of samples. We encountered non-insignificant amount of recent papers, in which improper evaluation methods are used, borrowed mainly from the field of balanced problems. Such bad practices may heavily bias the results in favour of inappropriate algorithms and give false expectations of the state of the field.


Please Stop Explaining Black Box Models for High Stakes Decisions

arXiv.org Machine Learning

Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward - it is to design models that are inherently interpretable.


Microscope 2.0: An Augmented Reality Microscope with Real-time Artificial Intelligence Integration

arXiv.org Artificial Intelligence

The brightfield microscope is instrumental in the visual examination of both biological and physical samples at sub-millimeter scales. One key clinical application has been in cancer histopathology, where the microscopic assessment of the tissue samples is used for the diagnosis and staging of cancer and thus guides clinical therapy. However, the interpretation of these samples is inherently subjective, resulting in significant diagnostic variability. Moreover, in many regions of the world, access to pathologists is severely limited due to lack of trained personnel. In this regard, Artificial Intelligence (AI) based tools promise to improve the access and quality of healthcare. However, despite significant advances in AI research, integration of these tools into real-world cancer diagnosis workflows remains challenging because of the costs of image digitization and difficulties in deploying AI solutions. Here we propose a cost-effective solution to the integration of AI: the Augmented Reality Microscope (ARM). The ARM overlays AI-based information onto the current view of the sample through the optical pathway in real-time, enabling seamless integration of AI into the regular microscopy workflow. We demonstrate the utility of ARM in the detection of lymph node metastases in breast cancer and the identification of prostate cancer with a latency that supports real-time workflows. We anticipate that ARM will remove barriers towards the use of AI in microscopic analysis and thus improve the accuracy and efficiency of cancer diagnosis. This approach is applicable to other microscopy tasks and AI algorithms in the life sciences and beyond.


Twitter-based traffic information system based on vector representations for words

arXiv.org Machine Learning

Recently, researchers have shown an increased interest in harnessing Twitter data for dynamic monitoring of traffic conditions. Bag-of-words representation is a common method in literature for tweet modeling and retrieving traffic information, yet it suffers from the curse of dimensionality and sparsity. To address these issues, our specific objective is to propose a simple and robust framework on the top of word embedding for distinguishing traffic-related tweets against non-traffic-related ones. In our proposed model, a tweet is classified as traffic-related if semantic similarity between its words and a small set of traffic keywords exceeds a threshold value. Semantic similarity between words is captured by means of word-embedding models, which is an unsupervised learning tool. The proposed model is as simple as having only one trainable parameter. The model takes advantage of outstanding merits, which are demonstrated through several evaluation steps. The state-of-the-art test accuracy for our proposed model is 95.9%. Introduction In the past decade, social media networks have received much attention among ordinary people, agencies, and research scholars. Twitter is one of the fastest-growing social media tools that enables users to post and read short messages, called tweets. By means of Twitter applications on smartphones, users are able to immediately reports events happening around them on a real-time basis. The information disseminated by millions of active users everyday generates a new version of dynamic database that contains information about various topics.


50 Years of Test (Un)fairness: Lessons for Machine Learning

arXiv.org Artificial Intelligence

Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.


Personalizing Intervention Probabilities By Pooling

arXiv.org Machine Learning

In many mobile health interventions, treatments should only be delivered in a particular context, for example when a user is currently stressed, walking or sedentary. Even in an optimal context, concerns about user burden can restrict which treatments are sent. To diffuse the treatment delivery over times when a user is in a desired context, it is critical to predict the future number of times the context will occur. The focus of this paper is on whether personalization can improve predictions in these settings. Though the variance between individuals' behavioral patterns suggest that personalization should be useful, the amount of individual-level data limits its capabilities. Thus, we investigate several methods which pool data across users to overcome these deficiencies and find that pooling lowers the overall error rate relative to both personalized and batch approaches.


Prediction of New Onset Diabetes after Liver Transplant

arXiv.org Machine Learning

25% of people who received a liver transplant will go on to develop diabetes within the next 5 years. These thousands of individuals are at 2-fold higher risk of cardiovascular events, graft loss, infections, as well as lower long-term survival. This is partly due to the medication used during and/or after transplant that significantly impacts metabolic balance. To assess which medication best suits the patient's condition, clinicians need an accurate estimate of diabetes risk. Both patient's historical data and observations at the current visit are informative in predicting whether the patient will develop diabetes within the following year. In this work we compared a variety of time-to-event prediction models as well as classifiers predicting the likelihood of the event within a year from the current checkup. We are particularly interested in comparing two types of models: 1) standard time-to-event predictors where the historical measurements are merely concatenated, 2) incorporating Deep Markov Model to first obtain low-dimensional embedding of historical data and then using this embedding as an additional input into the model. We compared a variety of algorithms including standard and regularized Cox proportional-hazards model (CPH), mixed effect random forests, survival-forests and Weibull Time-To-Event Recurrent Neural Network (WTTE-RNN). The results show that although all methods' performances varied from year to year and there was no clear winner across all the time points, regularized CPH model that used 1 to 3 years of historical visits data on average achieved a high, clinically relevant Concordance Index of .863. We thus recommend this model for further prospective clinical validation and hopefully, an eventual use in the clinic to improve clinicians' ability to personalize post-operative care and reduce the incidence of new-onset diabetes post liver transplant.


Large Spectral Density Matrix Estimation by Thresholding

arXiv.org Machine Learning

Spectral density matrix estimation of multivariate time series is a classical problem in time series and signal processing. In modern neuroscience, spectral density based metrics are commonly used for analyzing functional connectivity among brain regions. In this paper, we develop a non-asymptotic theory for regularized estimation of high-dimensional spectral density matrices of Gaussian and linear processes using thresholded versions of averaged periodograms. Our theoretical analysis ensures that consistent estimation of spectral density matrix of a $p$-dimensional time series using $n$ samples is possible under high-dimensional regime $\log p / n \rightarrow 0$ as long as the true spectral density is approximately sparse. A key technical component of our analysis is a new concentration inequality of average periodogram around its expectation, which is of independent interest. Our estimation consistency results complement existing results for shrinkage based estimators of multivariate spectral density, which require no assumption on sparsity but only ensure consistent estimation in a regime $p^2/n \rightarrow 0$. In addition, our proposed thresholding based estimators perform consistent and automatic edge selection when learning coherence networks among the components of a multivariate time series. We demonstrate the advantage of our estimators using simulation studies and a real data application on functional connectivity analysis with fMRI data.


Predicting Inpatient Discharge Prioritization With Electronic Health Records

arXiv.org Machine Learning

Identifying patients who will be discharged within 24 hours can improve hospital resource management and quality of care. We studied this problem using eight years of Electronic Health Records (EHR) data from Stanford Hospital. We fit models to predict 24 hour discharge across the entire inpatient population. The best performing models achieved an area under the receiver-operator characteristic curve (AUROC) of 0.85 and an AUPRC of 0.53 on a held out test set. This model was also well calibrated. Finally, we analyzed the utility of this model in a decision theoretic framework to identify regions of ROC space in which using the model increases expected utility compared to the trivial always negative or always positive classifiers.