AITopics | microcluster

Collaborating Authors

microcluster

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Sparse Representation Clustering for High-Dimensional Data Streams

Chen, Jie, Mao, Hua, Gou, Yuanbiao, Peng, Xi

arXiv.org Artificial IntelligenceSep-6-2024

Data stream clustering reveals patterns within continuously arriving, potentially unbounded data sequences. Numerous data stream algorithms have been proposed to cluster data streams. The existing data stream clustering algorithms still face significant challenges when addressing high-dimensional data streams. First, it is intractable to measure the similarities among high-dimensional data objects via Euclidean distances when constructing and merging microclusters. Second, these algorithms are highly sensitive to the noise contained in high-dimensional data streams. In this paper, we propose a hierarchical sparse representation clustering (HSRC) method for clustering high-dimensional data streams. HSRC first employs an $l_1$-minimization technique to learn an affinity matrix for data objects in individual landmark windows with fixed sizes, where the number of neighboring data objects is automatically selected. This approach ensures that highly correlated data samples within clusters are grouped together. Then, HSRC applies a spectral clustering technique to the affinity matrix to generate microclusters. These microclusters are subsequently merged into macroclusters based on their sparse similarity degrees (SSDs). Additionally, HSRC introduces sparsity residual values (SRVs) to adaptively select representative data objects from the current landmark window. These representatives serve as dictionary samples for the next landmark window. Finally, HSRC refines each macrocluster through fine-tuning. In particular, HSRC enables the detection of outliers in high-dimensional data streams via the associated SRVs. The experimental results obtained on several benchmark datasets demonstrate the effectiveness and robustness of HSRC.

algorithm, data stream, microcluster, (13 more...)

arXiv.org Artificial Intelligence

2409.04698

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

McCatch: Scalable Microcluster Detection in Dimensional and Nondimensional Datasets

Vinces, Braulio V. Sánchez, Cordeiro, Robson L. F., Faloutsos, Christos

arXiv.org Artificial IntelligenceMar-12-2024

How could we have an outlier detector that works even with nondimensional data, and ranks together both singleton microclusters ('one-off' outliers) and nonsingleton microclusters by their anomaly scores? How to obtain scores that are principled in one scalable and 'hands-off' manner? Microclusters of outliers indicate coalition or repetition in fraud activities, etc.; their identification is thus highly desirable. This paper presents McCatch: a new algorithm that detects microclusters by leveraging our proposed 'Oracle' plot (1NN Distance versus Group 1NN Distance). We study 31 real and synthetic datasets with up to 1M data elements to show that McCatch is the only method that answers both of the questions above; and, it outperforms 11 other methods, especially when the data has nonsingleton microclusters or is nondimensional. We also showcase McCatch's ability to detect meaningful microclusters in graphs, fingerprints, logs of network connections, text data, and satellite imagery. For example, it found a 30-elements microcluster of confirmed 'Denial of Service' attacks in the network logs, taking only ~3 minutes for 222K data elements on a stock desktop.

atch, microcluster, outlier, (17 more...)

arXiv.org Artificial Intelligence

2403.08027

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Minimum Spectral Connectivity Projection Pursuit

Hofmeyr, David P., Pavlidis, Nicos G., Eckley, Idris A.

arXiv.org Machine LearningNov-13-2017

We study the problem of determining the optimal low dimensional projection for maximising the separability of a binary partition of an unlabelled dataset, as measured by spectral graph theory. This is achieved by finding projections which minimise the second eigenvalue of the graph Laplacian of the projected data, which corresponds to a non-convex, non-smooth optimisation problem. We show that the optimal univariate projection based on spectral connectivity converges to the vector normal to the maximum margin hyperplane through the data, as the scaling parameter is reduced to zero. This establishes a connection between connectivity as measured by spectral graph theory and maximal Euclidean separation. The computational cost associated with each eigen-problem is quadratic in the number of data. To mitigate this issue, we propose an approximation method using microclusters with provable approximation error bounds. Combining multiple binary partitions within a divisive hierarchical model allows us to construct clustering solutions admitting clusters with varying scales and lying within different subspaces. We evaluate the performance of the proposed method on a large collection of benchmark datasets and find that it compares favourably with existing methods for projection pursuit and dimension reduction for data clustering.

artificial intelligence, machine learning, projection, (19 more...)

arXiv.org Machine Learning

1509.01546

Country: North America > United States (0.92)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Robust Ensemble Clustering Using Probability Trajectories

Huang, Dong, Lai, Jian-Huang, Wang, Chang-Dong

arXiv.org Machine LearningJun-3-2016

Note that V Y L Link set of G w ij W eight between two nodes in G G K -elite neighbor graph (K -ENG) V Node set of G . Note that V Y L Link set of G w ij W eight between two nodes in G p ij (1-step) transition probability fromy i to y j P (1-step) transition probability matrix,P { p ij } N N p T ij T -step transition probability fromy i to y j P T T -step transition probability matrix,P T { p T ij } N N p T i: The i -th row ofP T, p T i: { p T i 1,···,p T i N} PT T i Probability trajectory of a random walker starting fromnode y i with lengthT PTS ij Probability trajectory based similarity betweeny i and y j R (0) Set of the initial regions for PTA,R (0) { R (0) 1,···,R (0) R (0) } S (0) Initial similarity matrix for PTA,S (0) { s (0) ij } R (0) R (0) R ( t) Set of thet -step regions for PTA, R ( t) { R ( t) 1,···,R ( t) R ( t) } S ( t) The t -step similarity matrix for PTA,S ( t) { s ( t) ij } R ( t) R ( t) G Microcluster-cluster bipartite graph (MCBG) N Number of nodes in G V Node set of G L Link set of G w ij W eight between two nodes in G A sparse graph termedK -elite neighbor graph (K -ENG) is then constructed with only a small number of probably reliable links. The ENS strategy is a crucial step in our approach. W e argue that using a small number of probably reliable links may lead to significantly better consensus results than using all graph links regardless of their reliability . The random walk process driven by a new transition probability matrix is performed on theK -ENG to explore the global structure information.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TKDE.2015.2503753

1606.0116

Country: Asia > China > Guangdong Province (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback