AITopics | gap statistics

Collaborating Authors

gap statistics

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptively Robust and Sparse K-means Clustering

Li, Hao, Sugasawa, Shonosuke, Katayama, Shota

arXiv.org Machine LearningJul-9-2024

While K-means is known to be a standard clustering algorithm, it may be compromised due to the presence of outliers and high-dimensional noisy variables. This paper proposes adaptively robust and sparse K-means clustering (ARSK) to address these practical limitations of the standard K-means algorithm. We introduce a redundant error component for each observation for robustness, and this additional parameter is penalized using a group sparse penalty. To accommodate the impact of high-dimensional noisy variables, the objective function is modified by incorporating weights and implementing a penalty to control the sparsity of the weight vector. The tuning parameters to control the robustness and sparsity are selected by Gap statistics. Through simulation experiments and real data analysis, we demonstrate the superiority of the proposed method to existing algorithms in identifying clusters without outliers and informative variables simultaneously.

algorithm, dataset, outlier, (16 more...)

arXiv.org Machine Learning

2407.06945

Country:

North America > United States > California (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Asymptotics for The $k$-means

Zhang, Tonglin

arXiv.org Artificial IntelligenceNov-17-2022

Clustering is one of the most important unsupervised learning techniques for understanding the underlying data structures. The goal is to partition a data set into many subsets, called clusters, such that the observations within the subsets are the most homogeneous and the observations between the subsets are the most heterogeneous. Clustering is usually carried out by specifying a similarity or dissimilarity measure between observations. Examples include the k-means [17, 19, 29, 37], the k-medians [3], the k-modes [5], and the generalized k-means [2, 31, 45], as well as many of their modifications [21, 24, 42]. Among those, the k-means has been considered as one of the most straightforward and popular methods since it was proposed sixty years ago [23, 36]. Although it is well known, the investigation of the theoretical properties is still far behind, leading to difficulties in developing more precise k-means methods in practice. The goal of the present research is to propose a new concept called clustering consistency for the asymptotics of the k-means with a resulting clustering method better than the existing k-means methods adopted by many software packages, including those adopted by R and Python.

artificial intelligence, k-means method, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.10015

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

K-Means Clustering: Techniques to Find the Optimal Clusters

#artificialintelligenceJun-24-2021, 02:50:20 GMT

As the points are uniformly distributed, the KMeans algorithm evenly splits the points into K clusters even if there's no separation between them Gap Statistics gives the optimal number of the cluster as 10 based on the maximum gap between the cluster inertia on the data and null referenced data.

centroid, dataset, optimal cluster, (13 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Data-Driven Learning of the Number of States in Multi-State Autoregressive Models

Ding, Jie, Noshad, Mohammad, Tarokh, Vahid

arXiv.org Machine LearningOct-9-2015

In this work, we consider the class of multi-state autoregressive processes that can be used to model non-stationary time-series of interest. In order to capture different autoregressive (AR) states underlying an observed time series, it is crucial to select the appropriate number of states. We propose a new model selection technique based on the Gap statistics, which uses a null reference distribution on the stable AR filters to check whether adding a new AR state significantly improves the performance of the model. To that end, we define a new distance measure between AR filters based on mean squared prediction error (MSPE), and propose an efficient method to generate random stable filters that are uniformly distributed in the coefficient space. Numerical results are provided to evaluate the performance of the proposed approach.

artificial intelligence, machine learning, reference curve, (14 more...)

arXiv.org Machine Learning

1506.02107

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Learning the Number of Autoregressive Mixtures in Time Series Using the Gap Statistics

Ding, Jie, Noshad, Mohammad, Tarokh, Vahid

arXiv.org Machine LearningSep-10-2015

Using a proper model to characterize a time series is crucial in making accurate predictions. In this work we use time-varying autoregressive process (TVAR) to describe non-stationary time series and model it as a mixture of multiple stable autoregressive (AR) processes. We introduce a new model selection technique based on Gap statistics to learn the appropriate number of AR filters needed to model a time series. We define a new distance measure between stable AR filters and draw a reference curve that is used to measure how much adding a new AR filter improves the performance of the model, and then choose the number of AR filters that has the maximum gap with the reference curve. To that end, we propose a new method in order to generate uniform random stable AR filters in root domain. Numerical results are provided demonstrating the performance of the proposed approach.

artificial intelligence, gap statistics, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/ICDMW.2015.209

1509.03381

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback