AITopics | mcnichola

Collaborating Authors

mcnichola

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Turtle shell clustering: A mixture approach to discriminative clustering with applications to flow cytometry and other data

Neal, Mackenzie R., McNicholas, Paul D., White, Arthur

arXiv.org Machine LearningApr-28-2026

Generative approaches to clustering provide information on geometric properties of clusters, whereas discriminative approaches provide boundaries between clusters. Ideas from both approaches are incorporated to present a fully unsupervised, probabilistic, and discriminative clustering method via a regularized mutual information objective function, wherein a mixture of mixtures of Gaussian and uniform distributions is used for formulation of the conditional model. Automatic selection of the number of components is established with the introduction of the regularizing term and a merge step, similar to those applied in reversible jump Markov chain Monte Carlo methods used in Bayesian clustering. Consequently, the turtle shell method -- a fully unsupervised clustering method capable of estimating non-linear boundary lines, automatically selecting the number of components, and capturing intuitive clusters in the presence of data abnormalities such as noise and/or irregular cluster shapes -- is introduced. We test this method on various simulated and real datasets commonly explored in clustering research, and extend the analysis to datasets arising from flow cytometry experiments.

artificial intelligence, machine learning, section 3, (18 more...)

arXiv.org Machine Learning

2604.23083

Country:

North America > Canada (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Spatial Covariance Constraints for Gaussian Mixture Models

Lu, Hanzhang, Malott, Keiran, Bitra, Venkat Suprabath, Milligan, Kirsty, Subedi, Sanjeena, Cassol, Edana, Chauhan, Vinita, McNairn, Connor, Muir, Bryan, Pasricha, Prarthana, Murugkar, Sangeeta, Thomson, Rowan, Jirasek, Andrew, Andrews, Jeffrey L.

arXiv.org Machine LearningJan-14-2026

Although extensive research exists in spatial modeling, few studies have addressed finite mixture model-based clustering methods for spatial data. Finite mixture models, especially Gaussian mixture models, particularly suffer from high dimensionality due to the number of free covariance parameters. This study introduces a spatial covariance constraint for Gaussian mixture models that requires only four free parameters for each component, independent of dimensionality. Using a coordinate system, the spatially constrained Gaussian mixture model enables clustering of multi-way spatial data and inference of spatial patterns. The parameter estimation is conducted by combining the expectation-maximization (EM) algorithm with the generalized least squares (GLS) estimator. Simulation studies and applications to Raman spectroscopy data are provided to demonstrate the proposed model.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

2601.07979

Country: North America > Canada (0.68)

Genre: Research Report (0.64)

Industry: Health & Medicine > Nuclear Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

funOCLUST: Clustering Functional Data with Outliers

Clark, Katharine M., McNicholas, Paul D.

arXiv.org Machine LearningAug-6-2025

An extension of the OCLUST algorithm to the functional setting is proposed to address these issue s. The approach leverages the OCLUST framework, creating a robust method to cluster cu rves and trim outliers. The methodology is evaluated on both simulated and real-wor ld functional datasets, demonstrating strong performance in clustering and outlie r identification.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2508.0011

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Ontario (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Cluster weighted models with multivariate skewed distributions for functional data

Anton, Cristina, Shreshtth, Roy Shivam Ram

arXiv.org Machine LearningApr-17-2025

Cluster weighted models with multivariate skewed distributions for functional data Cristina Anton, 1 Roy Shivam Ram Shreshtth 2 1 Department of Mathematics and Statistics, MacEwan University, 103C, 10700-104 Ave., Edmonton, AB T5J 4S2, Canada, email: popescuc@macewan.ca 2 Department of Mathematics and Statistics, Indian Institute of Technology Kanpur Abstract We propose a clustering method, funWeightClustSkew, based on mixtures of functional linear regression models and three skewed multivariate distributions: the variance-gamma distribution, the skew-t distribution, and the normal-inverse Gaussian distribution. Our approach follows the framework of the functional high dimensional data clustering (funHDDC) method, and we extend to functional data the cluster weighted models based on skewed distributions used for finite dimensional multivariate data. We consider several parsimonious models, and to estimate the parameters we construct an expectation maximization (EM) algorithm. We illustrate the performance of funWeightClustSkew for simulated data and for the Air Quality dataset. Keywords: Cluster weighted models, Functional linear regression, EM algorithm, Skewed distributions, Multivariate functional principal component analysis 1 Introduction Smart devices and other modern technologies record huge amounts of data measured continuously in time. These data are better represented as curves instead of finite-dimensional vectors, and they are analyzed using statistical methods specific to functional data (Ramsay and Silverman, 2006; Ferraty and Vieu, 2006; Horv ath and Kokoszka, 2012). Many times more than one curve is collected for one individual, e.g.

artificial intelligence, kw 1 2, machine learning, (18 more...)

arXiv.org Machine Learning

2504.12683

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.24)
North America > United States > New York (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

Add feedback

Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for Clustering Count Data

Payne, Andrea, Silva, Anjali, Rothstein, Steven J., McNicholas, Paul D., Subedi, Sanjeena

arXiv.org Machine LearningNov-13-2023

A mixture of multivariate Poisson-log normal factor analyzers is introduced by imposing constraints on the covariance matrix, which resulted in flexible models for clustering purposes. In particular, a class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced. Variational Gaussian approximation is used for parameter estimation, and information criteria are used for model selection. The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies. Using real and simulated data, the models are shown to give favourable clustering performance. The GitHub R package for this work is available at https://github.com/anjalisilva/mixMPLNFA and is released under the open-source MIT license.

artificial intelligence, expression, machine learning, (13 more...)

arXiv.org Machine Learning

2311.07762

Country:

North America > Canada > Ontario > Toronto (0.28)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > Canada > Ontario > Hamilton (0.14)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Clustering Three-Way Data with Outliers

Clark, Katharine M., McNicholas, Paul D.

arXiv.org Machine LearningOct-11-2023

Matrix-variate normal mixture models are a powerful statistical tool used to represent complex data structures that involve matrices, such as multivariate time series, spatial data, and image data. Detecting outliers in matrix-variate normal mixture models is crucial for identifying anomalous observations that deviate significantly from the underlying distribution. Outliers can provide valuable insights into data quality issues, anomalies, or unexpected patterns. Outliers, and their treatment, is a long-studied topic in the field of applied statistics. The problem of handling outliers in multivariate clustering has been studied in several contexts including work by García-Escudero et al. (2008), Punzo and McNicholas (2016), Punzo et al. (2020), and Clark and McNicholas (2023).

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2310.05288

Country:

Oceania > New Zealand (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > Canada > Ontario > Hamilton (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Scalable Model-Based Gaussian Process Clustering

Chakraborty, Anirban, Chakraborty, Abhisek

arXiv.org Machine LearningSep-14-2023

Gaussian process is an indispensable tool in clustering functional data, owing to it's flexibility and inherent uncertainty quantification. However, when the functional data is observed over a large grid (say, of length $p$), Gaussian process clustering quickly renders itself infeasible, incurring $O(p^2)$ space complexity and $O(p^3)$ time complexity per iteration; and thus prohibiting it's natural adaptation to large environmental applications. To ensure scalability of Gaussian process clustering in such applications, we propose to embed the popular Vecchia approximation for Gaussian processes at the heart of the clustering task, provide crucial theoretical insights towards algorithmic design, and finally develop a computationally efficient expectation maximization (EM) algorithm. Empirical evidence of the utility of our proposal is provided via simulations and analysis of polar temperature anomaly (\href{https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series}{noaa.gov}) data-sets.

approximation, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2309.07882

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe (0.04)
Asia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Clustering and Semi-Supervised Classification for Clickstream Data via Mixture Models

Gallaugher, Michael P. B., McNicholas, Paul D.

arXiv.org Machine LearningDec-16-2020

Finite mixture models have been used for unsupervised learning for some time, and their use within the semi-supervised paradigm is becoming more commonplace. Clickstream data is one of the various emerging data types that demands particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous time Markov models is introduced for unsupervised and semi-supervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated, and compared to the discrete time approach, using simulated and real data.

classification, continuous time model, time model, (16 more...)

arXiv.org Machine Learning

1802.04849

Country:

North America > United States > Texas (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > Alameda County > Hayward (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Clustering Higher Order Data: Finite Mixtures of Multidimensional Arrays

Tait, Peter A., McNicholas, Paul D.

arXiv.org Machine LearningJul-19-2019

There have been many examples of clustering multivariate (i.e., two-way) data using finite mixture models (see, e.g., reviews by Fraley and Raftery, 2002; Bouveyron and Brunet-Saumard, 2014; McNicholas, 2016b). More recently, there have been some notable examples of clustering threeway data using finite mixtures of matrix-variate distributions (e.g., Viroli, 2011; Anderlucci et al., 2015; Gallaugher and McNicholas, 2018a). This work on clustering three-way data is timely in the sense that the variety of data that require clustering continues to increase. Furthermore, there is no reason to believe that this need ends with three-way data. An approach for clustering multi-way data is introduced based on a finite mixture of multidimensional arrays. While some might refer to such structures as'tensors', and so write about clustering tensor-variate data, we prefer the nomenclature multidimensional array to avoid confusion with the term'tensor' as used in engineering and physics, e.g., tensor fields.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1907.08566

Country:

North America > United States (0.46)
North America > Canada (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions

Gallaugher, Michael P. B., Tang, Yang, McNicholas, Paul D.

arXiv.org Machine LearningMar-12-2019

Robust clustering of high-dimensional data is an important topic because, in many practical situations, real data sets are heavy-tailed and/or asymmetric. Moreover, traditional model-based clustering often fails for high dimensional data due to the number of free covariance parameters. A parametrization of the component scale matrices for the mixture of generalized hyperbolic distributions is proposed by including a penalty term in the likelihood constraining the parameters resulting in a flexible model for high dimensional data and a meaningful interpretation. An analytically feasible EM algorithm is developed by placing a gamma-Lasso penalty constraining the concentration matrix. The proposed methodology is investigated through simulation studies and two real data sets.

artificial intelligence, machine learning, mcnichola, (16 more...)

arXiv.org Machine Learning

1903.05054

Country:

North America > United States > California (0.28)
North America > Canada > Ontario (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback