Goto

Collaborating Authors

 Asif, Amina


An Aggregation of Aggregation Methods in Computational Pathology

arXiv.org Artificial Intelligence

Image analysis and machine learning algorithms operating on multi-gigapixel whole-slide images (WSIs) often process a large number of tiles (sub-images) and require aggregating predictions from the tiles in order to predict WSI-level labels. In this paper, we present a review of existing literature on various types of aggregation methods with a view to help guide future research in the area of computational pathology (CPath). We propose a general CPath workflow with three pathways that consider multiple levels and types of data and the nature of computation to analyse WSIs for predictive modelling. We categorize aggregation methods according to the context and representation of the data, features of computational modules and CPath use cases. We compare and contrast different methods based on the principle of multiple instance learning, perhaps the most commonly used aggregation method, covering a wide range of CPath literature. To provide a fair comparison, we consider a specific WSI-level prediction task and compare various aggregation methods for that task. Finally, we conclude with a list of objectives and desirable attributes of aggregation methods in general, pros and cons of the various approaches, some recommendations and possible future directions.


An embarrassingly simple approach to neural multiple instance classification

arXiv.org Machine Learning

Multiple Instance Learning (MIL) is a weak supervision learning paradigm that allows modeling of machine learning problems in which labels are available only for groups of examples called bags. A positive bag may contain one or more positive examples but it is not known which examples in the bag are positive. All examples in a negative bag belong to the negative class. Such problems arise frequently in fields of computer vision, medical image processing and bioinformatics. Many neural network based solutions have been proposed in the literature for MIL, however, almost all of them rely on introducing specialized blocks and connectivity in the architectures. In this paper, we present a novel and effective approach to Multiple Instance Learning in neural networks. Instead of making changes to the architectures, we propose a simple bag-level ranking loss function that allows Multiple Instance Classification in any neural architecture. We have demonstrated the effectiveness of our proposed method for popular MIL benchmark datasets. In addition, we have tested the performance of our method in convolutional neural networks used to model an MIL problem derived from the well-known MNIST dataset. Results have shown that despite being simpler, our proposed scheme is comparable or better than existing methods in the literature in practical scenarios. Python code files for all the experiments can be found at https://github.com/amina01/ESMIL.


Ten ways to fool the masses with machine learning

arXiv.org Machine Learning

If you want to tell people the truth, make them laugh, otherwise they'll kill you. (source unclear) Machine learning and deep learning are the technologies of the day for developing intelligent automatic systems. However, a key hurdle for progress in the field is the literature itself: we often encounter papers that report results that are difficult to reconstruct or reproduce, results that mis-represent the performance of the system, or contain other biases that limit their validity. In this semi-humorous article, we discuss issues that arise in running and reporting results of machine learning experiments. The purpose of the article is to provide a list of watch out points for researchers to be aware of when developing machine learning models or writing and reviewing machine learning papers.


A generalized meta-loss function for distillation and learning using privileged information for classification and regression

arXiv.org Machine Learning

Abstract: Learning using privileged information and distillation are powerful machine learning frameworks that allow a machine learning model to be learned from an existing model or from a classifier trained over another feature space. Existing implementations of learning using privileged information are limited to classification only. In this work, we have proposed a novel meta-loss function that allows the general application of learning using privileged information and distillation to not only classification but also regression and other related problems. Our experimental results show the usefulness of the proposed scheme. Introduction DeepNeural Networks (DNNs) have been shown to successfully model many complex machine learning problems.


Machine Learning with Abstention for Automated Liver Disease Diagnosis

arXiv.org Machine Learning

This paper presents a novel approach for detection of liver abnormalities in an automated manner using ultrasound images. For this purpose, we have implemented a machine learning model that can not only generate labels (normal and abnormal) for a given ultrasound image but it can also detect when its prediction is likely to be incorrect. The proposed model abstains from generating the label of a test example if it is not confident about its prediction. Such behavior is commonly practiced by medical doctors who, when given insufficient information or a difficult case, can chose to carry out further clinical or diagnostic tests before generating a diagnosis. However, existing machine learning models are designed in a way to always generate a label for a given example even when the confidence of their prediction is low. We have proposed a novel stochastic gradient based solver for the learning with abstention paradigm and use it to make a practical, state of the art method for liver disease classification. The proposed method has been benchmarked on a data set of approximately 100 patients from MINAR, Multan, Pakistan and our results show that the proposed scheme offers state of the art classification performance.


Training large margin host-pathogen protein-protein interaction predictors

arXiv.org Machine Learning

Detection of protein-protein interactions (PPIs) plays a vital role in molecular biology. Particularly, infections are caused by the interactions of host and pathogen proteins. It is important to identify host-pathogen interactions (HPIs) to discover new drugs to counter infectious diseases. Conventional wet lab PPI prediction techniques have limitations in terms of large scale application and budget. Hence, computational approaches are developed to predict PPIs. This study aims to develop large margin machine learning models to predict interspecies PPIs with a special interest in host-pathogen protein interactions (HPIs). Especially, we focus on seeking answers to three queries that arise while developing an HPI predictor. 1) How should we select negative samples? 2) What should be the size of negative samples as compared to the positive samples? 3) What type of margin violation penalty should be used to train the predictor? We compare two available methods for negative sampling. Moreover, we propose a new method of assigning weights to each training example in weighted SVM depending on the distance of the negative examples from the positive examples. We have also developed a web server for our HPI predictor called HoPItor (Host Pathogen Interaction predicTOR) that can predict interactions between human and viral proteins. This webserver can be accessed at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#HoPItor.


pyLEMMINGS: Large Margin Multiple Instance Classification and Ranking for Bioinformatics Applications

arXiv.org Machine Learning

Motivation: A major challenge in the development of machine learning based methods in computational biology is that data may not be accurately labeled due to the time and resources required for experimentally annotating properties of proteins and DNA sequences. Standard supervised learning algorithms assume accurate instance-level labeling of training data. Multiple instance learning is a paradigm for handling such labeling ambiguities. However, the widely used large-margin classification methods for multiple instance learning are heuristic in nature with high computational requirements. In this paper, we present stochastic sub-gradient optimization large margin algorithms for multiple instance classification and ranking, and provide them in a software suite called pyLEMMINGS. Results: We have tested pyLEMMINGS on a number of bioinformatics problems as well as benchmark datasets. pyLEMMINGS has successfully been able to identify functionally important segments of proteins: binding sites in Calmodulin binding proteins, prion forming regions, and amyloid cores. pyLEMMINGS achieves state-of-the-art performance in all these tasks, demonstrating the value of multiple instance learning. Furthermore, our method has shown more than 100-fold improvement in terms of running time as compared to heuristic solutions with improved accuracy over benchmark datasets. Availability and Implementation: pyLEMMINGS python package is available for download at: http://faculty.pieas.edu.pk/fayyaz/software.html#pylemmings.