Crowley, Mark
Generative Locally Linear Embedding
Ghojogh, Benyamin, Ghodsi, Ali, Karray, Fakhri, Crowley, Mark
Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we propose two novel generative versions of LLE, named Generative LLE (GLLE), whose linear reconstruction steps are stochastic rather than deterministic. GLLE assumes that every data point is caused by its linear reconstruction weights as latent factors. The proposed GLLE algorithms can generate various LLE embeddings stochastically while all the generated embeddings relate to the original LLE embedding. We propose two versions for stochastic linear reconstruction, one using expectation maximization and another with direct sampling from a derived distribution by optimization. The proposed GLLE methods are closely related to and inspired by variational inference, factor analysis, and probabilistic principal component analysis. Our simulations show that the proposed GLLE methods work effectively in unfolding and generating submanifolds of data.
Magnification Generalization for Histopathology Image Embedding
Sikaroudi, Milad, Ghojogh, Benyamin, Karray, Fakhri, Crowley, Mark, Tizhoosh, H. R.
Histopathology image embedding is an active research area in computer vision. Most of the embedding models exclusively concentrate on a specific magnification level. However, a useful task in histopathology embedding is to train an embedding space regardless of the magnification level. Two main approaches for tackling this goal are domain adaptation and domain generalization, where the target magnification levels may or may not be introduced to the model in training, respectively. Although magnification adaptation is a well-studied topic in the literature, this paper, to the best of our knowledge, is the first work on magnification generalization for histopathology image embedding. We use an episodic trainable domain generalization technique for magnification generalization, namely Model Agnostic Learning of Semantic Features (MASF), which works based on the Model Agnostic Meta-Learning (MAML) concept. Our experimental results on a breast cancer histopathology dataset with four different magnification levels show the proposed method's effectiveness for magnification generalization.
Factor Analysis, Probabilistic Principal Component Analysis, Variational Inference, and Variational Autoencoder: Tutorial and Survey
Ghojogh, Benyamin, Ghodsi, Ali, Karray, Fakhri, Crowley, Mark
This is a tutorial and survey paper on factor analysis, probabilistic Principal Component Analysis (PCA), variational inference, and Variational Autoencoder (VAE). These methods, which are tightly related, are dimensionality reduction and generative models. They asssume that every data point is generated from or caused by a low-dimensional latent factor. By learning the parameters of distribution of latent space, the corresponding low-dimensional factors are found for the sake of dimensionality reduction. For their stochastic and generative behaviour, these models can also be used for generation of new data points in the data space. In this paper, we first start with variational inference where we derive the Evidence Lower Bound (ELBO) and Expectation Maximization (EM) for learning the parameters. Then, we introduce factor analysis, derive its joint and marginal distributions, and work out its EM steps. Probabilistic PCA is then explained, as a special case of factor analysis, and its closed-form solutions are derived. Finally, VAE is explained where the encoder, decoder and sampling from the latent space are introduced. Training VAE using both EM and backpropagation are explained.
Locally Linear Embedding and its Variants: Tutorial and Survey
Ghojogh, Benyamin, Ghodsi, Ali, Karray, Fakhri, Crowley, Mark
This is a tutorial and survey paper for Locally Linear Embedding (LLE) and its variants. The idea of LLE is fitting the local structure of manifold in the embedding space. In this paper, we first cover LLE, kernel LLE, inverse LLE, and feature fusion with LLE. Then, we cover out-of-sample embedding using linear reconstruction, eigenfunctions, and kernel mapping. Incremental LLE is explained for embedding streaming data. Landmark LLE methods using the Nystrom approximation and locally linear landmarks are explained for big data embedding. We introduce the methods for parameter selection of number of neighbors using residual variance, Procrustes statistics, preservation neighborhood error, and local neighborhood selection. Afterwards, Supervised LLE (SLLE), enhanced SLLE, SLLE projection, probabilistic SLLE, supervised guided LLE (using Hilbert-Schmidt independence criterion), and semi-supervised LLE are explained for supervised and semi-supervised embedding. Robust LLE methods using least squares problem and penalty functions are also introduced for embedding in the presence of outliers and noise. Then, we introduce fusion of LLE with other manifold learning methods including Isomap (i.e., ISOLLE), principal component analysis, Fisher discriminant analysis, discriminant LLE, and Isotop. Finally, we explain weighted LLE in which the distances, reconstruction weights, or the embeddings are adjusted for better embedding; we cover weighted LLE for deformed distributed data, weighted LLE using probability of occurrence, SLLE by adjusting weights, modified LLE, and iterative LLE.
Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review
Ghojogh, Benyamin, Nekoei, Hadi, Ghojogh, Aydin, Karray, Fakhri, Crowley, Mark
This paper is a tutorial and literature review on sampling algorithms. We have two main types of sampling in statistics. The first type is survey sampling which draws samples from a set or population. The second type is sampling from probability distribution where we have a probability density or mass function. In this paper, we cover both types of sampling. First, we review some required background on mean squared error, variance, bias, maximum likelihood estimation, Bernoulli, Binomial, and Hypergeometric distributions, the Horvitz-Thompson estimator, and the Markov property. Then, we explain the theory of simple random sampling, bootstrapping, stratified sampling, and cluster sampling. We also briefly introduce multistage sampling, network sampling, and snowball sampling. Afterwards, we switch to sampling from distribution. We explain sampling from cumulative distribution function, Monte Carlo approximation, simple Monte Carlo methods, and Markov Chain Monte Carlo (MCMC) methods. For simple Monte Carlo methods, whose iterations are independent, we cover importance sampling and rejection sampling. For MCMC methods, we cover Metropolis algorithm, Metropolis-Hastings algorithm, Gibbs sampling, and slice sampling. Then, we explain the random walk behaviour of Monte Carlo methods and more efficient Monte Carlo methods, including Hamiltonian (or hybrid) Monte Carlo, Adler's overrelaxation, and ordered overrelaxation. Finally, we summarize the characteristics, pros, and cons of sampling methods compared to each other. This paper can be useful for different fields of statistics, machine learning, reinforcement learning, and computational physics.
Batch-Incremental Triplet Sampling for Training Triplet Networks Using Bayesian Updating Theorem
Sikaroudi, Milad, Ghojogh, Benyamin, Karray, Fakhri, Crowley, Mark, Tizhoosh, H. R.
Variants of Triplet networks are robust entities for learning a discriminative embedding subspace. There exist different triplet mining approaches for selecting the most suitable training triplets. Some of these mining methods rely on the extreme distances between instances, and some others make use of sampling. However, sampling from stochastic distributions of data rather than sampling merely from the existing embedding instances can provide more discriminative information. In this work, we sample triplets from distributions of data rather than from existing instances. We consider a multivariate normal distribution for the embedding of each class. Using Bayesian updating and conjugate priors, we update the distributions of classes dynamically by receiving the new mini-batches of training data. The proposed triplet mining with Bayesian updating can be used with any triplet-based loss function, e.g., triplet-loss or Neighborhood Component Analysis (NCA) loss. Accordingly, Our triplet mining approaches are called Bayesian Updating Triplet (BUT) and Bayesian Updating NCA (BUNCA), depending on which loss function is being used. Experimental results on two public datasets, namely MNIST and histopathology colorectal cancer (CRC), substantiate the effectiveness of the proposed triplet mining method.
Acceleration of Large Margin Metric Learning for Nearest Neighbor Classification Using Triplet Mining and Stratified Sampling
Poorheravi, Parisa Abdolrahim, Ghojogh, Benyamin, Gaudet, Vincent, Karray, Fakhri, Crowley, Mark
Metric learning is one of the techniques in manifold learning with the goal of finding a projection subspace for increasing and decreasing the inter- and intra-class variances, respectively. Some of the metric learning methods are based on triplet learning with anchor-positive-negative triplets. Large margin metric learning for nearest neighbor classification is one of the fundamental methods to do this. Recently, Siamese networks have been introduced with the triplet loss. Many triplet mining methods have been developed for Siamese networks; however, these techniques have not been applied on the triplets of large margin metric learning for nearest neighbor classification. In this work, inspired by the mining methods for Siamese networks, we propose several triplet mining techniques for large margin metric learning. Moreover, a hierarchical approach is proposed, for acceleration and scalability of optimization, where triplets are selected by stratified sampling in hierarchical hyper-spheres. We analyze the proposed methods on three publicly available datasets, i.e., Fisher Iris, ORL faces, and MNIST datasets.
Design of Efficient Deep Learning models for Determining Road Surface Condition from Roadside Camera Images and Weather Data
Carrillo, Juan, Crowley, Mark, Pan, Guangyuan, Fu, Liping
Road maintenance during the Winter season is a safety critical and resource demanding operation. One of its key activities is determining road surface condition (RSC) in order to prioritize roads and allocate cleaning efforts such as plowing or salting. Two conventional approaches for determining RSC are: visual examination of roadside camera images by trained personnel and patrolling the roads to perform on-site inspections. However, with more than 500 cameras collecting images across Ontario, visual examination becomes a resource-intensive activity, difficult to scale especially during periods of snowstorms. This paper presents the results of a study focused on improving the efficiency of road maintenance operations. We use multiple Deep Learning models to automatically determine RSC from roadside camera images and weather variables, extending previous research where similar methods have been used to deal with the problem. The dataset we use was collected during the 2017-2018 Winter season from 40 stations connected to the Ontario Road Weather Information System (RWIS), it includes 14.000 labeled images and 70.000 weather measurements. We train and evaluate the performance of seven state-of-the-art models from the Computer Vision literature, including the recent DenseNet, NASNet, and MobileNet. Moreover, by following systematic ablation experiments we adapt previously published Deep Learning models and reduce their number of parameters to about 1.3% compared to their original parameter count, and by integrating observations from weather variables the models are able to better ascertain RSC under poor visibility conditions.
Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey
Ghojogh, Benyamin, Ghodsi, Ali, Karray, Fakhri, Crowley, Mark
Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods. Some simulations to visualize the embeddings are also provided.
Multidimensional Scaling, Sammon Mapping, and Isomap: Tutorial and Survey
Ghojogh, Benyamin, Ghodsi, Ali, Karray, Fakhri, Crowley, Mark
Multidimensional Scaling (MDS) is one of the first fundamental manifold learning methods. It can be categorized into several methods, i.e., classical MDS, kernel classical MDS, metric MDS, and non-metric MDS. Sammon mapping and Isomap can be considered as special cases of metric MDS and kernel classical MDS, respectively. In this tutorial and survey paper, we review the theory of MDS, Sammon mapping, and Isomap in detail. We explain all the mentioned categories of MDS. Then, Sammon mapping, Isomap, and kernel Isomap are explained. Out-of-sample embedding for MDS and Isomap using eigenfunctions and kernel mapping are introduced. Then, Nystrom approximation and its use in landmark MDS and landmark Isomap are introduced for big data embedding. We also provide some simulations for illustrating the embedding by these methods.