An Overview of Melanoma Detection in Dermoscopy Images Using Image Processing and Machine Learning Machine Learning

The incidence of malignant melanoma continues to increase worldwide. This cancer can strike at any age; it is one of the leading causes of loss of life in young persons. Since this cancer is visible on the skin, it is potentially detectable at a very early stage when it is curable. New developments have converged to make fully automatic early melanoma detection a real possibility. First, the advent of dermoscopy has enabled a dramatic boost in clinical diagnostic ability to the point that melanoma can be detected in the clinic at the very earliest stages. The global adoption of this technology has allowed accumulation of large collections of dermoscopy images of melanomas and benign lesions validated by histopathology. The development of advanced technologies in the areas of image processing and machine learning have given us the ability to allow distinction of malignant melanoma from the many benign mimics that require no biopsy. These new technologies should allow not only earlier detection of melanoma, but also reduction of the large number of needless and costly biopsy procedures. Although some of the new systems reported for these technologies have shown promise in preliminary trials, widespread implementation must await further technical progress in accuracy and reproducibility. In this paper, we provide an overview of computerized detection of melanoma in dermoscopy images. First, we discuss the various aspects of lesion segmentation. Then, we provide a brief overview of clinical feature segmentation. Finally, we discuss the classification stage where machine learning algorithms are applied to the attributes generated from the segmented features to predict the existence of melanoma.

Statistical Inference, Learning and Models in Big Data Machine Learning

The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context. Statistical ideas are an essential part of this, and as a partial response, a thematic program on statistical inference, learning, and models in big data was held in 2015 in Canada, under the general direction of the Canadian Statistical Sciences Institute, with major funding from, and most activities located at, the Fields Institute for Research in Mathematical Sciences. This paper gives an overview of the topics covered, describing challenges and strategies that seem common to many different areas of application, and including some examples of applications to make these challenges and strategies more concrete.

Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation Machine Learning

The completion of low rank matrices from few entries is a task with many practical applications. We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank $r$ reliably from the fewest possible random entries, and performance in achieving small reconstruction error. We propose a spectral algorithm for these two tasks called MaCBetH (for Matrix Completion with the Bethe Hessian). The rank is estimated as the number of negative eigenvalues of the Bethe Hessian matrix, and the corresponding eigenvectors are used as initial condition for the minimization of the discrepancy between the estimated matrix and the revealed entries. We analyze the performance in a random matrix setting using results from the statistical mechanics of the Hopfield neural network, and show in particular that MaCBetH efficiently detects the rank $r$ of a large $n\times m$ matrix from $C(r)r\sqrt{nm}$ entries, where $C(r)$ is a constant close to $1$. We also evaluate the corresponding root-mean-square error empirically and show that MaCBetH compares favorably to other existing approaches.

Robustness Analysis of Preconditioned Successive Projection Algorithm for General Form of Separable NMF Problem Machine Learning

The successive projection algorithm (SPA) has been known to work well for separable nonnegative matrix factorization (NMF) problems arising in applications, such as topic extraction from documents and endmember detection in hyperspectral images. One of the reasons is in that the algorithm is robust to noise. Gillis and Vavasis showed in [SIAM J. Optim., 25(1), pp. 677-698, 2015] that a preconditioner can further enhance its noise robustness. The proof rested on the condition that the dimension $d$ and factorization rank $r$ in the separable NMF problem coincide with each other. However, it may be unrealistic to expect that the condition holds in separable NMF problems appearing in actual applications; in such problems, $d$ is usually greater than $r$. This paper shows, without the condition $d=r$, that the preconditioned SPA is robust to noise.

Learning Model-Based Sparsity via Projected Gradient Descent Machine Learning

Several convex formulation methods have been proposed previously for statistical estimation with structured sparsity as the prior. These methods often require a carefully tuned regularization parameter, often a cumbersome or heuristic exercise. Furthermore, the estimate that these methods produce might not belong to the desired sparsity model, albeit accurately approximating the true parameter. Therefore, greedy-type algorithms could often be more desirable in estimating structured-sparse parameters. So far, these greedy methods have mostly focused on linear statistical models. In this paper we study the projected gradient descent with non-convex structured-sparse parameter model as the constraint set. Should the cost function have a Stable Model-Restricted Hessian the algorithm produces an approximation for the desired minimizer. As an example we elaborate on application of the main results to estimation in Generalized Linear Model.

Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity Machine Learning

Principal Component Analysis (PCA) is a dimension reduction technique. It produces inconsistent estimators when the dimensionality is moderate to high, which is often the problem in modern large-scale applications where algorithm scalability and model interpretability are difficult to achieve, not to mention the prevalence of missing values. While existing sparse PCA methods alleviate inconsistency, they are constrained to the Gaussian assumption of classical PCA and fail to address algorithm scalability issues. We generalize sparse PCA to the broad exponential family distributions under high-dimensional setup, with built-in treatment for missing values. Meanwhile we propose a family of iterative sparse generalized PCA (SG-PCA) algorithms such that despite the non-convexity and non-smoothness of the optimization task, the loss function decreases in every iteration. In terms of ease and intuitive parameter tuning, our sparsity-inducing regularization is far superior to the popular Lasso. Furthermore, to promote overall scalability, accelerated gradient is integrated for fast convergence, while a progressive screening technique gradually squeezes out nuisance dimensions of a large-scale problem for feasible optimization. High-dimensional simulation and real data experiments demonstrate the efficiency and efficacy of SG-PCA.

Hierarchical Vector Autoregression Machine Learning

Vector autoregression (VAR) is a fundamental tool for modeling the joint dynamics of multivariate time series. However, as the number of component series is increased, the VAR model quickly becomes overparameterized, making reliable estimation difficult and impeding its adoption as a forecasting tool in high dimensional settings. A number of authors have sought to address this issue by incorporating regularized approaches, such as the lasso, that impose sparse or low-rank structures on the estimated coefficient parameters of the VAR. More traditional approaches attempt to address overparameterization by selecting a low lag order, based on the assumption that dynamic dependence among components is short-range. However, these methods typically assume a single, universal lag order that applies across all components, unnecessarily constraining the dynamic relationship between the components and impeding forecast performance. The lasso-based approaches are more flexible but do not incorporate the notion of lag order selection. We propose a new class of regularized VAR models, called hierarchical vector autoregression (HVAR), that embed the notion of lag selection into a convex regularizer. The key convex modeling tool is a group lasso with nested groups which ensure the sparsity pattern of autoregressive lag coefficients honors the ordered structure inherent to VAR. We provide computationally efficient algorithms for solving HVAR problems that can be parallelized across the components. A simulation study shows the improved performance in forecasting and lag order selection over previous approaches, and a macroeconomic application further highlights forecasting improvements as well as the convenient, interpretable output of a HVAR model.

Quantum machine learning with glow for episodic tasks and decision games Artificial Intelligence

We consider a general class of models, where a reinforcement learning (RL) agent learns from cyclic interactions with an external environment via classical signals. Perceptual inputs are encoded as quantum states, which are subsequently transformed by a quantum channel representing the agent's memory, while the outcomes of measurements performed at the channel's output determine the agent's actions. The learning takes place via stepwise modifications of the channel properties. They are described by an update rule that is inspired by the projective simulation (PS) model and equipped with a glow mechanism that allows for a backpropagation of policy changes, analogous to the eligibility traces in RL and edge glow in PS. In this way, the model combines features of PS with the ability for generalization, offered by its physical embodiment as a quantum system. We apply the agent to various setups of an invasion game and a grid world, which serve as elementary model tasks allowing a direct comparison with a basic classical PS agent.

Font Identification in Historical Documents Using Active Learning Machine Learning

Identifying the type of font (e.g., Roman, Blackletter) used in historical documents can help optical character recognition (OCR) systems produce more accurate text transcriptions. Towards this end, we present an active-learning strategy that can significantly reduce the number of labeled samples needed to train a font classifier. Our approach extracts image-based features that exploit geometric differences between fonts at the word level, and combines them into a bag-of-word representation for each page in a document. We evaluate six sampling strategies based on uncertainty, dissimilarity and diversity criteria, and test them on a database containing over 3,000 historical documents with Blackletter, Roman and Mixed fonts. Our results show that a combination of uncertainty and diversity achieves the highest predictive accuracy (89% of test cases correctly classified) while requiring only a small fraction of the data (17%) to be labeled. We discuss the implications of this result for mass digitization projects of historical documents.

Supersparse Linear Integer Models for Optimized Medical Scoring Systems Machine Learning

Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by solving an integer program that directly encodes measures of accuracy (the 0-1 loss) and sparsity (the $\ell_0$-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce highly tailored models without parameter tuning. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM was used to create a highly tailored scoring system for sleep apnea screening