Gallaugher, Michael P. B., McNicholas, Paul D.

Clustering is the process of finding and analyzing underlying group structures in data. In recent years, data as become increasingly higher dimensional and therefore an increased need for dimension reduction techniques for use in clustering. Although such techniques are firmly established in the literature for multivariate data, there is a relative paucity in the area of matrix variate or three way data. Furthermore, these few methods all assume matrix variate normality which is not always sensible if skewness is present. We propose a mixture of bilinear factor analyzers model using four skewed matrix variate distributions, namely the matrix variate skew-t, generalized hyperbolic, variance gamma and normal inverse Gaussian distributions.

Gallaugher, Michael P. B., McNicholas, Paul D.

Over the years data has become increasingly higher dimensional, which has prompted an increased need for dimension reduction techniques. This is perhaps especially true for clustering (unsupervised classification) as well as semi-supervised and supervised classification. Although dimension reduction in the area of clustering for multivariate data has been quite thoroughly discussed in the literature, there is relatively little work in the area of three way, or matrix variate, data. Herein, we develop a mixture of matrix variate bilinear factor analyzers (MMVBFA) model for use in clustering high-dimensional matrix variate data. This work can be considered both the first matrix variate bilinear factor analyzers model as well as the first MMVBFA model. Parameter estimation is discussed, and the MMVBFA model is illustrated using simulated and real data.

Bierling, Vanessa S. E., McNicholas, Paul D.

Finite mixture models have become a popular tool for clustering. Amongst other uses, they have been applied for clustering longitudinal data and clustering high-dimensional data. In the latter case, a latent Gaussian mixture model is sometimes used. Although there has been much work on clustering using latent variables and on clustering longitudinal data, respectively, there has been a paucity of work that combines these features. An approach is developed for clustering longitudinal data with many time points based on an extension of the mixture of common factor analyzers model. A variation of the expectation-maximization algorithm is used for parameter estimation and the Bayesian information criterion is used for model selection. The approach is illustrated using real and simulated data.

Murray, Paula M., McNicholas, Paul D., Browne, Ryan P.

A mixture of common skew-t factor analyzers model is introduced for model-based clustering of high-dimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is too large to be well-modelled by the mixtures of factor analyzers model or a variant thereof. Furthermore, assuming that the component densities follow a skew-t distribution allows robust clustering of skewed data. The alternating expectation-conditional maximization algorithm is employed for parameter estimation. We demonstrate excellent clustering performance when our model is applied to real and simulated data.This paper marks the first time that skewed common factors have been used.

Tortora, Cristina, McNicholas, Paul D., Browne, Ryan P.

Model-based clustering imposes a finite mixture modelling structure on data for clustering. Finite mixture models assume that the population is a convex combination of a finite number of densities, the distribution within each population is a basic assumption of each particular model. Among all distributions that have been tried, the generalized hyperbolic distribution has the advantage that is a generalization of several other methods, such as the Gaussian distribution, the skew t-distribution, etc. With specific parameters, it can represent either a symmetric or a skewed distribution. While its inherent flexibility is an advantage in many ways, it means the estimation of more parameters than its special and limiting cases. The aim of this work is to propose a mixture of generalized hyperbolic factor analyzers to introduce parsimony and extend the method to high dimensional data. This work can be seen as an extension of the mixture of factor analyzers model to generalized hyperbolic mixtures. The performance of our generalized hyperbolic factor analyzers is illustrated on real data, where it performs favourably compared to its Gaussian analogue.