AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Improving Multilayer-Perceptron(MLP)-based Network Anomaly Detection with Birch Clustering on CICIDS-2017 Dataset

Yin, Yuhua, Jang-Jaccard, Julian, Sabrina, Fariza, Kwak, Jin

arXiv.org Artificial IntelligenceOct-30-2022

Machine learning algorithms have been widely used in intrusion detection systems, including Multi-layer Perceptron (MLP). In this study, we proposed a two-stage model that combines the Birch clustering algorithm and MLP classifier to improve the performance of network anomaly multi-classification. In our proposed method, we first apply Birch or Kmeans as an unsupervised clustering algorithm to the CICIDS-2017 dataset to pre-group the data. The generated pseudo-label is then added as an additional feature to the training of the MLP-based classifier. The experimental results show that using Birch and K-Means clustering for data pre-grouping can improve intrusion detection system performance. Our method can achieve 99.73% accuracy in multi-classification using Birch clustering, which is better than similar researches using a stand-alone MLP model.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2208.09711

Country:

Oceania > New Zealand (0.04)
Oceania > Australia > Queensland (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Monitoring the Dynamic Networks of Stock Returns

Touli, Elena Farahbakhsh, Nguyen, Hoang, Bodnar, Olha

arXiv.org Artificial IntelligenceOct-29-2022

In this paper, we study the connection between the companies in the Swedish capital market. We consider 28 companies included in the determination of the market index OMX30. The network structure of the market is constructed using different methods to determine the distance between the companies. We use hierarchical clustering methods to find the relation among the companies in each window. Next, we obtain one-dimensional time series of the distances between the clustering trees that reflect the changes in the relationship between the companies in the market over time. The method of statistical process control, namely the Shewhart control chart, is applied to those time series to detect abnormal changes in the financial market.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.16679

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Banking & Finance > Trading (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

How To Overcome Richness Axiom Fallacy

Kłopotek, Mieczysław A., Kłopotek, Robert A.

arXiv.org Artificial IntelligenceOct-27-2022

The paper points at the grieving problems implied by the richness axiom in the Kleinberg's axiomatic system and suggests resolutions. The richness induces learnability problem in general and leads to conflicts with consistency axiom. As a resolution, learnability constraints and usage of centric consistency or restriction of the domain of considered clusterings to super-ball-clusterings is proposed.

artificial intelligence, kleinberg, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.15507

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Conversation Disentanglement with Bi-Level Contrastive Learning

Huang, Chengyu, Zhang, Zheng, Fei, Hao, Liao, Lizi

arXiv.org Artificial IntelligenceOct-27-2022

Conversation disentanglement aims to group utterances into detached sessions, which is a fundamental task in processing multi-party conversations. Existing methods have two main drawbacks. First, they overemphasize pairwise utterance relations but pay inadequate attention to the utterance-to-context relation modeling. Second, huge amount of human annotated data is required for training, which is expensive to obtain in practice. To address these issues, we propose a general disentangle model based on bi-level contrastive learning. It brings closer utterances in the same session while encourages each utterance to be near its clustered session prototypes in the representation space. Unlike existing approaches, our disentangle model works in both supervised setting with labeled data and unsupervised setting when no such data is available. The proposed method achieves new state-of-the-art performance on both settings across several public datasets.

artificial intelligence, machine learning, utterance, (18 more...)

arXiv.org Artificial Intelligence

2210.15265

Country: Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

Fornaciari, Tommaso, Hovy, Dirk, Bianchi, Federico

arXiv.org Artificial IntelligenceOct-26-2022

The most common ways to explore latent document dimensions are topic models and clustering methods. However, topic models have several drawbacks: e.g., they require us to choose the number of latent dimensions a priori, and the results are stochastic. Most clustering methods have the same issues and lack flexibility in various ways, such as not accounting for the influence of different topics on single documents, forcing word-descriptors to belong to a single topic (hard-clustering) or necessarily relying on word representations. We propose PROgressive SImilarity Thresholds - ProSiT, a deterministic and interpretable method, agnostic to the input format, that finds the optimal number of latent dimensions and only has two hyper-parameters, which can be set efficiently via grid search. We compare this method with a wide range of topic models and clustering methods on four benchmark data sets. In most setting, ProSiT matches or outperforms the other methods in terms six metrics of topic coherence and distinctiveness, producing replicable, deterministic results.

artificial intelligence, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2210.14763

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Law (0.67)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Meta-node: A Concise Approach to Effectively Learn Complex Relationships in Heterogeneous Graphs

Park, Jiwoong, Jeong, Jisu, Kim, Kyungmin, Choi, Jin Young

arXiv.org Artificial IntelligenceOct-26-2022

Existing message passing neural networks for heterogeneous graphs rely on the concepts of meta-paths or meta-graphs due to the intrinsic nature of heterogeneous graphs. However, the meta-paths and meta-graphs need to be pre-configured before learning and are highly dependent on expert knowledge to construct them. To tackle this challenge, we propose a novel concept of meta-node for message passing that can learn enriched relational knowledge from complex heterogeneous graphs without any meta-paths and meta-graphs by explicitly modeling the relations among the same type of nodes. Unlike meta-paths and meta-graphs, meta-nodes do not require any pre-processing steps that require expert knowledge. Going one step further, we propose a meta-node message passing scheme and apply our method to a contrastive learning model. In the experiments on node clustering and classification tasks, the proposed meta-node message passing method outperforms state-of-the-arts that depend on meta-paths. Our results demonstrate that effective heterogeneous graph learning is possible without the need for meta-paths that are frequently used in this field.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2210.1448

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

A new Stack Autoencoder: Neighbouring Sample Envelope Embedded Stack Autoencoder Ensemble Model

Zhou, Chuanyan, Ma, Jie, Li, Fan, Li, Yongming, Wang, Pin, Zhang, Xiaoheng

arXiv.org Artificial IntelligenceOct-25-2022

Stack autoencoder (SAE), as a representative deep network, has unique and excellent performance in feature learning, and has received extensive attention from researchers. However, existing deep SAEs focus on original samples without considering the hierarchical structural information between samples. To address this limitation, this paper proposes a new SAE model-neighbouring envelope embedded stack autoencoder ensemble (NE_ESAE). Firstly, the neighbouring sample envelope learning mechanism (NSELM) is proposed for preprocessing of input of SAE. NSELM constructs sample pairs by combining neighbouring samples. Besides, the NSELM constructs a multilayer sample spaces by multilayer iterative mean clustering, which considers the similar samples and generates layers of envelope samples with hierarchical structural information. Second, an embedded stack autoencoder (ESAE) is proposed and trained in each layer of sample space to consider the original samples during training and in the network structure, thereby better finding the relationship between original feature samples and deep feature samples. Third, feature reduction and base classifiers are conducted on the layers of envelope samples respectively, and output classification results of every layer of samples. Finally, the classification results of the layers of envelope sample space are fused through the ensemble mechanism. In the experimental section, the proposed algorithm is validated with over ten representative public datasets. The results show that our method significantly has better performance than existing traditional feature learning methods and the representative deep autoencoders.

artificial intelligence, autoencoder, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.14956

Country:

North America > United States > Wisconsin (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Quantum Semi-Supervised Learning with Quantum Supremacy

Shangnan, Zhou

arXiv.org Artificial IntelligenceOct-24-2022

Quantum machine learning promises to efficiently solve important problems. There are two persistent challenges in classical machine learning: the lack of labeled data, and the limit of computational power. We propose a novel framework that resolves both issues: quantum semi-supervised learning. Moreover, we provide a protocol in systematically designing quantum machine learning algorithms with quantum supremacy, which can be extended beyond quantum semi-supervised learning. In the meantime, we show that naive quantum matrix product estimation algorithm outperforms the best known classical matrix multiplication algorithm. We showcase two concrete quantum semi-supervised learning algorithms: a quantum self-training algorithm named the propagating nearest-neighbor classifier, and the quantum semi-supervised K-means clustering algorithm. By doing time complexity analysis, we conclude that they indeed possess quantum supremacy.

artificial intelligence, inductive learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2110.02343

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.35)

Add feedback

CAST: Concurrent Recognition and Segmentation with Adaptive Segment Tokens

Ke, Tsung-Wei, Yu, Stella X.

arXiv.org Artificial IntelligenceOct-24-2022

Recognizing an image and segmenting it into coherent regions are often treated as separate tasks. Human vision, however, has a general sense of segmentation hierarchy before recognition occurs. We are thus inspired to learn image recognition with hierarchical image segmentation based entirely on unlabeled images. Our insight is to learn fine-to-coarse features concurrently at superpixels, segments, and full image levels, enforcing consistency and goodness of feature induced segmentations while maximizing discrimination among image instances. Our model innovates vision transformers on three aspects. 1) We use adaptive segment tokens instead of fixed-shape patch tokens. 2) We create a token hierarchy by inserting graph pooling between transformer blocks, naturally producing consistent multi-scale segmentations while increasing the segment size and reducing the number of tokens. 3) We produce hierarchical image segmentation for free while training for recognition by maximizing image-wise discrimination. Our work delivers the first concurrent recognition and hierarchical segmentation model without any supervision. Validated on ImageNet and PASCAL VOC, it achieves better recognition and segmentation with higher computational efficiency.

machine learning, natural language, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2210.00314

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Align then Fusion: Generalized Large-scale Multi-view Clustering with Anchor Matching Correspondences

Wang, Siwei, Liu, Xinwang, Liu, Suyuan, Jin, Jiaqi, Tu, Wenxuan, Zhu, Xinzhong, Zhu, En

arXiv.org Artificial IntelligenceOct-24-2022

Multi-view anchor graph clustering selects representative anchors to avoid full pair-wise similarities and therefore reduce the complexity of graph methods. Although widely applied in large-scale applications, existing approaches do not pay sufficient attention to establishing correct correspondences between the anchor sets across views. To be specific, anchor graphs obtained from different views are not aligned column-wisely. Such an \textbf{A}nchor-\textbf{U}naligned \textbf{P}roblem (AUP) would cause inaccurate graph fusion and degrade the clustering performance. Under multi-view scenarios, generating correct correspondences could be extremely difficult since anchors are not consistent in feature dimensions. To solve this challenging issue, we propose the first study of the generalized and flexible anchor graph fusion framework termed \textbf{F}ast \textbf{M}ulti-\textbf{V}iew \textbf{A}nchor-\textbf{C}orrespondence \textbf{C}lustering (FMVACC). Specifically, we show how to find anchor correspondence with both feature and structure information, after which anchor graph fusion is performed column-wisely. Moreover, we theoretically show the connection between FMVACC and existing multi-view late fusion \cite{liu2018late} and partial view-aligned clustering \cite{huang2020partially}, which further demonstrates our generality. Extensive experiments on seven benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Moreover, the proposed alignment module also shows significant performance improvement applying to existing multi-view anchor graph competitors indicating the importance of anchor alignment. Our code is available at \url{https://github.com/wangsiwei2010/NeurIPS22-FMVACC}.

anchor, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2205.15075

Country: Asia > China > Hunan Province > Changsha (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback