AITopics | outlier cluster

Collaborating Authors

outlier cluster

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distributionally Robust K-Means Clustering

Malik, Vikrant, Kargin, Taylan, Hassibi, Babak

arXiv.org Machine LearningApr-14-2026

In recent years, the widespreadavailability of large-scale, high-dimensionaldatasets has driven significant interest in clustering algorithms that are both computationally efficient and robust to distributional shifts and outliers. The classical clustering method, K-means, can be seen as an application of the Lloyd-Max quantization algorithm, in which the distribution being quantized is the empirical distribution of the points to be clustered. This empirical distribution generally differs from the true underlying distribution, especially when the number of points to be clustered is small. This induces a distributional shift, which can also arise in many real-world settings, such as image segmentation, biological data analysis, and sensor networks, due to noise variations, sensor inaccuracies, or environmental changes. Distributional shifts can severely impact the performance of clustering algorithms, leading to degraded cluster assignments and unreliable downstream analysis. The field of clustering has a rich history. One of the most popular algorithms in this field is theK-means (KM) algorithm, introduced by [1], which computes centroids by iteratively updating the conditional mean of the data in the Voronoi regions induced by the centroids. However, standardK-means is sensitive to initialization and, in general, converges only to a local minimum.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2604.11118

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

Deng, Yuntian, Zhao, Wenting, Hessel, Jack, Ren, Xiang, Cardie, Claire, Choi, Yejin

arXiv.org Artificial IntelligenceSep-9-2024

The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria. To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds. We demonstrate WildVis' utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns. WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.

dataset, visualization, wildvisualizer, (16 more...)

arXiv.org Artificial Intelligence

2409.03753

Country:

North America > United States > California (0.14)
South America > Argentina (0.05)
North America > Dominican Republic (0.04)

Genre: Research Report (0.50)

Industry: Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Improving Spectral Clustering using the Asymptotic Value of the Normalised Cut

Hofmeyr, David

arXiv.org Machine LearningMar-29-2017

Spectral clustering is a popular and versatile clustering method based on a relaxation of the normalised graph cut objective. Despite its popularity, however, there is no single agreed upon method for tuning the important scaling parameter, nor for determining automatically the number of clusters to extract. Popular heuristics exist, but corresponding theoretical results are scarce. In this paper we investigate the asymptotic value of the normalised cut for an increasing sample assumed to arise from an underlying probability distribution, and based on this result provide recommendations for improving spectral clustering methodology. A corresponding algorithm is proposed with strong empirical performance.

algorithm, normalised cut, spectral, (13 more...)

arXiv.org Machine Learning

1703.09975

Country: Africa > Mali (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Add feedback

Ward's Method for clustering in SAS

@machinelearnbotMar-23-2016, 12:55:30 GMT

It looks at cluster analysis as an analysis of variance problem. This method involves an agglomerative clustering algorithm. It starts out with n clusters of size 1 and continues until all the observations are included into one cluster. This method is most appropriate for quantitative variables, and not binary variables. Then you can set some threshold for the outlier clusters, like the size of that cluster is smaller then n*0.1%.

artificial intelligence, machine learning, outlier cluster, (3 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.79)

Add feedback