AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Tensor-based Intrinsic Subspace Representation Learning for Multi-view Clustering

Zheng, Qinghai, Zhang, Yu, Zhu, Jihua, Li, Zhongyu, Tang, Haoyu, Ma, Shuangxun

arXiv.org Artificial IntelligenceNov-7-2022

As a hot research topic, many multi-view clustering approaches are proposed over the past few years. Nevertheless, most existing algorithms merely take the consensus information among different views into consideration for clustering. Actually, it may hinder the multi-view clustering performance in real-life applications, since different views usually contain diverse statistic properties. To address this problem, we propose a novel Tensor-based Intrinsic Subspace Representation Learning (TISRL) for multi-view clustering in this paper. Concretely, the rank preserving decomposition is proposed firstly to effectively deal with the diverse statistic information contained in different views. Then, to achieve the intrinsic subspace representation, the tensor-singular value decomposition based low-rank tensor constraint is also utilized in our method. It can be seen that specific information contained in different views is fully investigated by the rank preserving decomposition, and the high-order correlations of multi-view data are also mined by the low-rank tensor constraint. The objective function can be optimized by an augmented Lagrangian multiplier based alternating direction minimization algorithm. Experimental results on nine common used real-world multi-view datasets illustrate the superiority of TISRL.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2010.09193

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
(2 more...)

Add feedback

Significance-Based Categorical Data Clustering

Hu, Lianyu, Jiang, Mudi, Liu, Yan, He, Zengyou

arXiv.org Artificial IntelligenceNov-7-2022

Although numerous algorithms have been proposed to solve the categorical data clustering problem, how to access the statistical significance of a set of categorical clusters remains unaddressed. To fulfill this void, we employ the likelihood ratio test to derive a test statistic that can serve as a significance-based objective function in categorical data clustering. Consequently, a new clustering algorithm is proposed in which the significance-based objective function is optimized via a Monte Carlo search procedure. As a by-product, we can further calculate an empirical $p$-value to assess the statistical significance of a set of clusters and develop an improved gap statistic for estimating the cluster number. Extensive experimental studies suggest that our method is able to achieve comparable performance to state-of-the-art categorical data clustering algorithms. Moreover, the effectiveness of such a significance-based formulation on statistical cluster validation and cluster number estimation is demonstrated through comprehensive empirical results.

artificial intelligence, categorical data, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.03956

Country: Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Improved conformalized quantile regression

Sousa, Martim, Tomé, Ana Maria, Moreira, José

arXiv.org Machine LearningNov-6-2022

Conformalized quantile regression is a procedure that inherits the advantages of conformal prediction and quantile regression. That is, we use quantile regression to estimate the true conditional quantile and then apply a conformal step on a calibration set to ensure marginal coverage. In this way, we get adaptive prediction intervals that account for heteroscedasticity. However, the aforementioned conformal step lacks adaptiveness as described in (Romano et al., 2019). To overcome this limitation, instead of applying a single conformal step after estimating conditional quantiles with quantile regression, we propose to cluster the explanatory variables weighted by their permutation importance with an optimized k-means and apply k conformal steps. To show that this improved version outperforms the classic version of conformalized quantile regression and is more adaptive to heteroscedasticity, we extensively compare the prediction intervals of both in open datasets.

artificial intelligence, machine learning, quantile regression, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.eswa.2023.122322

2207.02808

Country:

Europe > Portugal > Aveiro > Aveiro (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Direct deduction of chemical class from NMR spectra

Kuhn, Stefan, Cobas, Carlos, Barba, Agustin, Colreavy-Donnelly, Simon, Caraffini, Fabio, Borges, Ricardo Moreira

arXiv.org Artificial IntelligenceNov-6-2022

Nuclear Magnetic Resonance (NMR) spectroscopy is an established technique in analytical chemistry. As a result of its rich structural and dynamic information content, it is particularly suitable for compound identification. However, a full elucidation may not always be possible or even necessary since some properties might be achievable directly from the spectra. If this is the case, a prioritisation of substances to be closely investigated for compound assignment can be done in the early stages of a study. A previous example for this idea was demonstrated in [1], where the authors showed that the existence of certain substructures can be concluded from profiles in the spectra. Another potentially useful application is chemical classification. These rely strongly on annotated chemical entities to provide a computable chemical taxonomy based on substructures.

artificial intelligence, machine learning, spectra, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.jmr.2023.107381

2211.03173

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
Europe > Estonia > Tartu County > Tartu (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Forecasting User Interests Through Topic Tag Predictions in Online Health Communities

Adishesha, Amogh Subbakrishna, Jakielaszek, Lily, Azhar, Fariha, Zhang, Peixuan, Honavar, Vasant, Ma, Fenglong, Belani, Chandra, Mitra, Prasenjit, Huang, Sharon Xiaolei

arXiv.org Artificial IntelligenceNov-4-2022

The increasing reliance on online communities for healthcare information by patients and caregivers has led to the increase in the spread of misinformation, or subjective, anecdotal and inaccurate or non-specific recommendations, which, if acted on, could cause serious harm to the patients. Hence, there is an urgent need to connect users with accurate and tailored health information in a timely manner to prevent such harm. This paper proposes an innovative approach to suggesting reliable information to participants in online communities as they move through different stages in their disease or treatment. We hypothesize that patients with similar histories of disease progression or course of treatment would have similar information needs at comparable stages. Specifically, we pose the problem of predicting topic tags or keywords that describe the future information needs of users based on their profiles, traces of their online interactions within the community (past posts, replies) and the profiles and traces of online interactions of other users with similar profiles and similar traces of past interaction with the target users. The result is a variant of the collaborative information filtering or recommendation system tailored to the needs of users of online health communities. We report results of our experiments on an expert curated data set which demonstrate the superiority of the proposed approach over the state of the art baselines with respect to accurate and timely prediction of topic tags (and hence information sources of interest).

bioinformatics, information, machine learning, (27 more...)

arXiv.org Artificial Intelligence

2211.02789

Country: Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)
Overview > Innovation (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Epidemiology (0.88)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.69)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(7 more...)

Add feedback

PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation

Keser, Reyhan Kevser, Ayanzadeh, Aydin, Aghdam, Omid Abdollahi, Kilcioglu, Caglar, Toreyin, Behcet Ugur, Ure, Nazim Kemal

arXiv.org Artificial IntelligenceNov-3-2022

One of the most efficient methods for model compression is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, conventional distillation approaches overlook this fact and use the same hint points as in the early studies. Therefore, we propose a clustering based hint selection methodology, where the layers of teacher model are clustered with respect to several metrics and the cluster centers are used as the hint points. Our method is applicable for any student network, once it is applied on a chosen teacher network. The proposed approach is validated in CIFAR-100 and ImageNet datasets, using various teacher-student pairs and numerous hint distillation methods. Our results show that hint points selected by our algorithm results in superior compression performance compared to state-of-the-art knowledge distillation algorithms on the same student models and datasets.

artificial intelligence, distillation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.eswa.2022.119040

2103.00053

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

A Bayesian Learning, Greedy agglomerative clustering approach and evaluation techniques for Author Name Disambiguation Problem

Sourav, Shashwat

arXiv.org Artificial IntelligenceNov-1-2022

Author names often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names. It creates difficulty in associating a scholarly work with the person who wrote it, thereby introducing inaccuracy in credit attribution, bibliometric analysis, search-by-author in a digital library, and expert discovery. A plethora of techniques for disambiguation of author names have been proposed in the literature. I try to focus on the research efforts targeted to disambiguate author names. I first go through the conventional methods, then I discuss evaluation techniques and the clustering model which finally leads to the Bayesian learning and Greedy agglomerative approach. I believe this concentrated review will be useful for the research community because it discusses techniques applied to a very large real database that is actively used worldwide. The Bayesian and the greedy agglomerative approach used will help to tackle AND problems in a better way. Finally, I try to outline a few directions for future work.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.01303

Country: Asia > India > Madhya Pradesh > Bhopal (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

Unsupervised Learning: Clustering

#artificialintelligenceOct-31-2022, 22:20:12 GMT

Airline contexts include punctuality, food, comfort, entertainment, etc. An owner can determine which areas of his business need to be focused on by using this analysis. His priority would be the quality of food being served to customers if, for example, the most negative comments are about food. There are situations, however, in which business owners are uncertain. It is also possible to lack training data.

#artificialintelligence

Country:

Asia > India > West Bengal > Kolkata (0.16)
Asia > India > Tamil Nadu > Chennai (0.05)
Asia > India > Maharashtra > Mumbai (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.52)

Add feedback

Seeking Commonness and Inconsistencies: A Jointly Smoothed Approach to Multi-view Subspace Clustering

Cai, Xiaosha, Huang, Dong, Zhang, Guang-Yu, Wang, Chang-Dong

arXiv.org Artificial IntelligenceOct-31-2022

Multi-view subspace clustering aims to discover the hidden subspace structures from multiple views for robust clustering, and has been attracting considerable attention in recent years. Despite significant progress, most of the previous multi-view subspace clustering algorithms are still faced with two limitations. First, they usually focus on the consistency (or commonness) of multiple views, yet often lack the ability to capture the cross-view inconsistencies in subspace representations. Second, many of them overlook the local structures of multiple views and cannot jointly leverage multiple local structures to enhance the subspace representation learning. To address these two limitations, in this paper, we propose a jointly smoothed multi-view subspace clustering (JSMC) approach. Specifically, we simultaneously incorporate the cross-view commonness and inconsistencies into the subspace representation learning. The view-consensus grouping effect is presented to jointly exploit the local structures of multiple views to regularize the view-commonness representation, which is further associated with the low-rank constraint via the nuclear norm to strengthen its cluster structure. Thus the cross-view commonness and inconsistencies, the view-consensus grouping effect, and the low-rank representation are seamlessly incorporated into a unified objective function, upon which an alternating optimization algorithm is performed to achieve a robust subspace representation for clustering. Experimental results on a variety of real-world multi-view datasets confirm the superiority of our approach. Code available: https://github.com/huangdonghere/JSMC.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.inffus.2022.10.020

2203.0806

Country:

North America > United States > Texas (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.50)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.88)

Add feedback

A Compressed Sensing Based Least Squares Approach to Semi-supervised Local Cluster Extraction

Lai, Ming-Jun, Shen, Zhaiming

arXiv.org Artificial IntelligenceOct-31-2022

A least squares semi-supervised local clustering algorithm based on the idea of compressed sensing is proposed to extract clusters from a graph with known adjacency matrix. The algorithm is based on a two-stage approach similar to the one in \cite{LaiMckenzie2020}. However, under a weaker assumption and with less computational complexity than the one in \cite{LaiMckenzie2020}, the algorithm is shown to be able to find a desired cluster with high probability. The ``one cluster at a time" feature of our method distinguishes it from other global clustering methods. Several numerical experiments are conducted on the synthetic data such as stochastic block model and real data such as MNIST, political blogs network, AT\&T and YaleB human faces data sets to demonstrate the effectiveness and efficiency of our algorithm.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2202.02904

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback