Goto

Collaborating Authors

Results


Secure and Robust Machine Learning for Healthcare: A Survey

arXiv.org Machine Learning

Recent years have witnessed widespread adoption of machine learning (ML)/deep learning (DL) techniques due to their superior performance for a variety of healthcare applications ranging from the prediction of cardiac arrest from one-dimensional heart signals to computer-aided diagnosis (CADx) using multi-dimensional medical images. Notwithstanding the impressive performance of ML/DL, there are still lingering doubts regarding the robustness of ML/DL in healthcare settings (which is traditionally considered quite challenging due to the myriad security and privacy issues involved), especially in light of recent results that have shown that ML/DL are vulnerable to adversarial attacks. In this paper, we present an overview of various application areas in healthcare that leverage such techniques from security and privacy point of view and present associated challenges. In addition, we present potential methods to ensure secure and privacy-preserving ML for healthcare applications. Finally, we provide insight into the current research challenges and promising directions for future research.


Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

arXiv.org Machine Learning

In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data-view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that will inherit the strong statistical, mathematical and empirical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data-view that are best for determining the groups, often leading to improved integrative clustering. To fit our model, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.


How Healthcare Is Using Big Data And AI To Cure Disease

#artificialintelligence

When it comes to medicine there are constant discoveries and advancements in the field. Now with the help of machine learning algorithms, personalized medicine and predictive patient outcome has taken another step towards curing diseases. With the data collected from patients, researchers are able to study different diseases and try to find better treatments and even cures. Scientists and pharmaceutical companies are able to use bioinformatics to develop new treatments and discover cures and treatments for diseases that currently do not have them. The benefit to using so much data is the ability to determine why some drugs worked for a population versus not for others.


How Healthcare Is Using Big Data And AI To Cure Disease

#artificialintelligence

When it comes to medicine there are constant discoveries and advancements in the field. Now with the help of machine learning algorithms, personalized medicine and predictive patient outcome has taken another step towards curing diseases. With the data collected from patients, researchers are able to study different diseases and try to find better treatments and even cures. Scientists and pharmaceutical companies are able to use bioinformatics to develop new treatments and discover cures and treatments for diseases that currently do not have them. The benefit to using so much data is the ability to determine why some drugs worked for a population versus not for others.


A review of single-source unsupervised domain adaptation

arXiv.org Machine Learning

Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the questions: when and how a classifier can learn from a source domain and generalize to a target domain. As for when, we review conditions that allow for cross-domain generalization error bounds. As for how, we present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods focus on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods focus on alternative estimators, such as robust, minimax or Bayesian. Our categorization highlights recurring ideas and raises a number of questions important to further research.



Classification of Big Data with Application to Imaging Genetics

arXiv.org Machine Learning

Big data applications, such as medical imaging and genetics, typically generate datasets that consist of few observations n on many more variables p, a scenario that we denote as p>>n. Traditional data processing methods are often insufficient for extracting information out of big data. This calls for the development of new algorithms that can deal with the size, complexity, and the special structure of such datasets. In this paper, we consider the problem of classifying p>>n data and propose a classification method based on linear discriminant analysis (LDA). Traditional LDA depends on the covariance estimate of the data, but when p>>n the sample covariance estimate is singular. The proposed method estimates the covariance by using a sparse version of noisy principal component analysis (nPCA). The use of sparsity in this setting aims at automatically selecting variables that are relevant for classification. In experiments, the new method is compared to state-of-the art methods for big data problems using both simulated datasets and imaging genetics datasets.