AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Semantically Consistent Multi-view Representation Learning

Zhou, Yiyang, Zheng, Qinghai, Bai, Shunshun, Zhu, Jihua

arXiv.org Artificial IntelligenceMar-7-2023

In this work, we devote ourselves to the challenging task of Unsupervised Multi-view Representation Learning (UMRL), which requires learning a unified feature representation from multiple views in an unsupervised manner. Existing UMRL methods mainly concentrate on the learning process in the feature space while ignoring the valuable semantic information hidden in different views. To address this issue, we propose a novel Semantically Consistent Multi-view Representation Learning (SCMRL), which makes efforts to excavate underlying multi-view semantic consensus information and utilize the information to guide the unified feature representation learning. Specifically, SCMRL consists of a within-view reconstruction module and a unified feature representation learning module, which are elegantly integrated by the contrastive learning strategy to simultaneously align semantic labels of both view-specific feature representations and the learned unified feature representation. In this way, the consensus information in the semantic space can be effectively exploited to constrain the learning process of unified feature representation. Compared with several state-of-the-art algorithms, extensive experiments demonstrate its superiority.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2303.04366

Country:

North America > United States (0.16)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

ElC-OIS: Ellipsoidal Clustering for Open-World Instance Segmentation on LiDAR Data

Deng, Wenbang, Huang, Kaihong, Yu, Qinghua, Lu, Huimin, Zheng, Zhiqiang, Chen, Xieyuanli

arXiv.org Artificial IntelligenceMar-7-2023

Open-world Instance Segmentation (OIS) is a challenging task that aims to accurately segment every object instance appearing in the current observation, regardless of whether these instances have been labeled in the training set. This is important for safety-critical applications such as robust autonomous navigation. In this paper, we present a flexible and effective OIS framework for LiDAR point cloud that can accurately segment both known and unknown instances (i.e., seen and unseen instance categories during training). It first identifies points belonging to known classes and removes the background by leveraging close-set panoptic segmentation networks. Then, we propose a novel ellipsoidal clustering method that is more adapted to the characteristic of LiDAR scans and allows precise segmentation of unknown instances. Furthermore, a diffuse searching method is proposed to handle the common over-segmentation problem presented in the known instances. With the combination of these techniques, we are able to achieve accurate segmentation for both known and unknown instances. We evaluated our method on the SemanticKITTI open-world LiDAR instance segmentation dataset. The experimental results suggest that it outperforms current state-of-the-art methods, especially with a 10.0% improvement in association quality. The source code of our method will be publicly available at https://github.com/nubot-nudt/ElC-OIS.

artificial intelligence, machine learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2303.04351

Country: Asia > China > Hunan Province > Changsha (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Deep Clustering with a Constraint for Topological Invariance based on Symmetric InfoNCE

Zhang, Yuhui, Wada, Yuichiro, Waida, Hiroki, Goto, Kaito, Hino, Yusaku, Kanamori, Takafumi

arXiv.org Artificial IntelligenceMar-6-2023

We consider the scenario of deep clustering, in which the available prior knowledge is limited. In this scenario, few existing state-of-the-art deep clustering methods can perform well for both non-complex topology and complex topology datasets. To address the problem, we propose a constraint utilizing symmetric InfoNCE, which helps an objective of deep clustering method in the scenario train the model so as to be efficient for not only non-complex topology but also complex topology datasets. Additionally, we provide several theoretical explanations of the reason why the constraint can enhances performance of deep clustering methods. To confirm the effectiveness of the proposed constraint, we introduce a deep clustering method named MIST, which is a combination of an existing deep clustering method and our constraint. Our numerical experiments via MIST demonstrate that the constraint is effective. In addition, MIST outperforms other state-of-the-art deep clustering methods for most of the commonly used ten benchmark datasets.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.03036

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Behaviour of Gaussian Mixture Models part1(Machine Learning)

#artificialintelligenceMar-5-2023, 18:35:10 GMT

Abstract: Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a provably robust clustering algorithm based on loss minimization that performs well on Gaussian mixture models with outliers. It provides theoretical guarantees that the algorithm obtains high accuracy with high probability under certain assumptions. Moreover, it can also be used as an initialization strategy for k-means clustering.

algorithm, classifier, gaussian mixture model part1, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)

Add feedback

Quantum anomaly detection in the latent space of proton collision events at the LHC

Woźniak, Kinga Anna, Belis, Vasilis, Puljak, Ema, Barkoutsos, Panagiotis, Dissertori, Günther, Grossi, Michele, Pierini, Maurizio, Reiter, Florentin, Tavernelli, Ivano, Vallecorsa, Sofia

arXiv.org Artificial IntelligenceMar-5-2023

We propose a new strategy for anomaly detection at the LHC based on unsupervised quantum machine learning algorithms. To accommodate the constraints on the problem size dictated by the limitations of current quantum hardware we develop a classical convolutional autoencoder. The designed quantum anomaly detection models, namely an unsupervised kernel machine and two clustering algorithms, are trained to find new-physics events in the latent representation of LHC data produced by the autoencoder. The performance of the quantum algorithms is benchmarked against classical counterparts on different new-physics scenarios and its dependence on the dimensionality of the latent space and the size of the training dataset is studied. For kernel-based anomaly detection, we identify a regime where the quantum model significantly outperforms its classical counterpart. An instance of the kernel machine is implemented on a quantum computer to verify its suitability for available hardware. We demonstrate that the observed consistent performance advantage is related to the inherent quantum properties of the circuit used.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.1078

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.14)
Europe > Austria > Vienna (0.14)
(3 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Mining both Commonality and Specificity from Multiple Documents for Multi-Document Summarization

Ma, Bing

arXiv.org Artificial IntelligenceMar-5-2023

The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents and satisfies content diversity. This paper proposes a multi-document summarization approach based on hierarchical clustering of documents. It utilizes the constructed class tree of documents to extract both the sentences reflecting the commonality of all documents and the sentences reflecting the specificity of some subclasses of these documents for generating a summary, so as to satisfy the coverage and diversity requirements of multi-document summarization. Comparative experiments with different variant approaches on DUC'2002-2004 datasets prove the effectiveness of mining both the commonality and specificity of documents for multi-document summarization. Experiments on DUC'2004 and Multi-News datasets show that our approach achieves competitive performance compared to the state-of-the-art unsupervised and supervised approaches.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.02677

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DeepLens: Interactive Out-of-distribution Data Detection in NLP Models

Song, Da, Wang, Zhijie, Huang, Yuheng, Ma, Lei, Zhang, Tianyi

arXiv.org Artificial IntelligenceMar-2-2023

Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DeepLens that has no interaction or visualization support.

deeplen, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3544548.3580741

2303.01577

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany > Hamburg (0.05)
(5 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

GBC: An Efficient and Adaptive Clustering Algorithm Based on Granular-Ball

Xia, Shuyin, Xie, Jiang, Wang, Guoyin

arXiv.org Artificial IntelligenceMar-2-2023

Existing clustering methods are based on a single granularity of information, such as the distance and density of each data. This most fine-grained based approach is usually inefficient and susceptible to noise. Inspired by adaptive process of granular-ball division and differentiation, we present a novel clustering approach that retains the speed and efficiency of K-means clustering while out-performing time-tested density clustering approaches widely used in industry today. Our simple, robust, adaptive granular-ball clustering method can efficiently recognize clusters with unknown and complex shapes without the use of extra parameters. Moreover, the proposed method provides an efficient, adaptive way to depict the world, and will promote the research and development of adaptive and efficient AI technologies, especially density computing models, and improve the efficiency of many existing clustering methods.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2205.14592

Country:

Asia > China > Chongqing Province > Chongqing (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Verifying the Union of Manifolds Hypothesis for Image Data

Brown, Bradley C. A., Caterini, Anthony L., Ross, Brendan Leigh, Cresswell, Jesse C., Loaiza-Ganem, Gabriel

arXiv.org Artificial IntelligenceMar-2-2023

Deep learning has had tremendous success at learning low-dimensional representations of high-dimensional data. This success would be impossible if there was no hidden low-dimensional structure in data of interest; this existence is posited by the manifold hypothesis, which states that the data lies on an unknown manifold of low intrinsic dimension. In this paper, we argue that this hypothesis does not properly capture the low-dimensional structure typically present in image data. Assuming that data lies on a single manifold implies intrinsic dimension is identical across the entire data space, and does not allow for subregions of this space to have a different number of factors of variation. To address this deficiency, we consider the union of manifolds hypothesis, which states that data lies on a disjoint union of manifolds of varying intrinsic dimensions. We empirically verify this hypothesis on commonly-used image datasets, finding that indeed, observed data lies on a disconnected set and that intrinsic dimension is not constant. We also provide insights into the implications of the union of manifolds hypothesis in deep learning, both supervised and unsupervised, showing that designing models with an inductive bias for this structure improves performance across classification and generative modelling tasks. Our code is available at https://github.com/layer6ai-labs/UoMH.

data mining, intrinsic dimension, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2207.02862

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

Sicherman, Amitay, Adi, Yossi

arXiv.org Artificial IntelligenceMar-1-2023

This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering zero-resource speech metrics such as ABX. Code and analysis tools are available under the following link: https://github.com/slp-rl/SLM-Discrete-Representations

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10097097

2301.00591

Country: Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback