AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

DORIC : Domain Robust Fine-Tuning for Open Intent Clustering through Dependency Parsing

Lee, Jihyun, Seo, Seungyeon, Kim, Yunsu, Lee, Gary Geunbae

arXiv.org Artificial IntelligenceMar-17-2023

We present our work on Track 2 in the Dialog System Technology Challenges 11 (DSTC11). DSTC11-Track2 aims to provide a benchmark for zero-shot, cross-domain, intent-set induction. In the absence of in-domain training dataset, robust utterance representation that can be used across domains is necessary to induce users' intentions. To achieve this, we leveraged a multi-domain dialogue dataset to fine-tune the language model and proposed extracting Verb-Object pairs to remove the artifacts of unnecessary information. Furthermore, we devised the method that generates each cluster's name for the explainability of clustered results. Our approach achieved 3rd place in the precision score and showed superior accuracy and normalized mutual information (NMI) score than the baseline model on various domain datasets.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.09827

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

LAVA: Granular Neuron-Level Explainable AI for Alzheimer's Disease Assessment from Fundus Images

Yousefzadeh, Nooshin, Tran, Charlie, Ramirez-Zamora, Adolfo, Chen, Jinghua, Fang, Ruogu, Thai, My T.

arXiv.org Artificial IntelligenceMar-16-2023

Alzheimer's Disease (AD) is a progressive neurodegenerative disease and the leading cause of dementia. Early diagnosis is critical for patients to benefit from potential intervention and treatment. The retina has been hypothesized as a diagnostic site for AD detection owing to its anatomical connection with the brain. Developed AI models for this purpose have yet to provide a rational explanation about the decision and neither infer the stage of disease's progression. Along this direction, we propose a novel model-agnostic explainable-AI framework, called Granular Neuron-level Explainer (LAVA), an interpretation prototype that probes into intermediate layers of the Convolutional Neural Network (CNN) models to assess the AD continuum directly from the retinal imaging without longitudinal or clinical evaluation. This method is applied to validate the retinal vasculature as a biomarker and diagnostic modality for Alzheimer's Disease (AD) evaluation. UK Biobank cognitive tests and vascular morphological features suggest LAVA shows strong promise and effectiveness in identifying AD stages across the progression continuum.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.03008

Country:

North America > United States > Florida > Alachua County > Gainesville (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Agglomeration of Polygonal Grids using Graph Neural Networks with applications to Multigrid solvers

Antonietti, P. F., Farenga, N., Manuzzi, E., Martinelli, G., Saverio, L.

arXiv.org Artificial IntelligenceMar-16-2023

Agglomeration-based strategies are important both within adaptive refinement algorithms and to construct scalable multilevel algebraic solvers. In order to automatically perform agglomeration of polygonal grids, we propose the use of Machine Learning (ML) strategies, that can naturally exploit geometrical information about the mesh in order to preserve the grid quality, enhancing performance of numerical methods and reducing the overall computational cost. In particular, we employ the k-means clustering algorithm and Graph Neural Networks (GNNs) to partition the connectivity graph of a computational mesh. Moreover, GNNs have high online inference speed and the advantage to process naturally and simultaneously both the graph structure of mesh and the geometrical information, such as the areas of the elements or their barycentric coordinates. These techniques are compared with METIS, a standard algorithm for graph partitioning, which is meant to process only the graph information of the mesh. We demonstrate that performance in terms of quality metrics is enhanced for ML strategies. Such models also show a good degree of generalization when applied to more complex geometries, such as brain MRI scans, and the capability of preserving the quality of the grid. The effectiveness of these strategies is demonstrated also when applied to MultiGrid (MG) solvers in a Polygonal Discontinuous Galerkin (PolyDG) framework. In the considered experiments, GNNs show overall the best performance in terms of inference speed, accuracy and flexibility of the approach.

artificial intelligence, graph neural network, machine learning, (4 more...)

arXiv.org Artificial Intelligence

2210.17457

Genre: Research Report (0.40)

Industry: Health & Medicine > Health Care Technology (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

Machine Learning for Flow Cytometry Data Analysis

Xu, Yanhua

arXiv.org Artificial IntelligenceMar-15-2023

Flow cytometry mainly used for detecting the characteristics of a number of biochemical substances based on the expression of specific markers in cells. It is particularly useful for detecting membrane surface receptors, antigens, ions, or during DNA/RNA expression. Not only can it be employed as a biomedical research tool for recognising distinctive types of cells in mixed populations, but it can also be used as a diagnostic tool for classifying abnormal cell populations connected with disease. Modern flow cytometers can rapidly analyse tens of thousands of cells at the same time while also measuring multiple parameters from a single cell. However, the rapid development of flow cytometers makes it challenging for conventional analysis methods to interpret flow cytometry data. Researchers need to be able to distinguish interesting-looking cell populations manually in multi-dimensional data collected from millions of cells. Thus, it is essential to find a robust approach for analysing flow cytometry data automatically, specifically in identifying cell populations automatically. This thesis mainly concerns discover the potential shortcoming of current automated-gating algorithms in both real datasets and synthetic datasets. Three representative automated clustering algorithms are selected to be applied, compared and evaluated by completely and partially automated gating. A subspace clustering ProClus also implemented in this thesis. The performance of ProClus in flow cytometry is not well, but it is still a useful algorithm to detect noise.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.09007

Country:

North America > United States (0.14)
Europe > Finland > North Karelia > Joensuu (0.04)
North America > Canada > Ontario > Hamilton (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Wireless Sensor Networks anomaly detection using Machine Learning: A Survey

Haque, Ahsnaul, Chowdhury, Md Naseef-Ur-Rahman, Soliman, Hamdy, Hossen, Mohammad Sahinur, Fatima, Tanjim, Ahmed, Imtiaz

arXiv.org Artificial IntelligenceMar-15-2023

Wireless Sensor Networks (WSNs) have become increasingly valuable in various civil/military applications like industrial process con trol, civil engineering applications such as buildings' structural strength monitoring, environmental monitoring, border intrusion, IoT (Internet of Things), and healthcare. However, the sensed data generated by WSNs is often noisy and unreliable, making it a challenge to detect and diagnose anomalies. Machine learning (ML) techniques have been widely used to address this problem by detecting and identifying unusual patterns in the sensed data. This survey paper provides an overview of the state of-the-art applications of ML techniques for data anomaly detection in WSN domains. We first introduce the characteristics of WSNs and the challenges of anomaly detection in WSNs. Then, we review various ML techniques such as supervised, unsupervised, and semi-supervised learn ing that have been applied to WSN data anomaly detection. We also compare different ML-based approaches and their performance evalu ation metrics. Finally, we discuss open research challenges and future directions for applying ML techniques in WSNs sensed data anomaly detection.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2303.08823

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.14)
Asia > Singapore (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(10 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.68)
Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

High-dimensional multi-view clustering methods

Zahir, Alaeddine, Jbilou, Khalide, Ratnani, Ahmed

arXiv.org Artificial IntelligenceMar-14-2023

Multi-view clustering has been widely used in recent years in comparison to single-view clustering, for clear reasons, as it offers more insights into the data, which has brought with it some challenges, such as how to combine these views or features. Most of recent work in this field focuses mainly on tensor representation instead of treating the data as simple matrices. This permits to deal with the high-order correlation between the data which the based matrix approach struggles to capture. Accordingly, we will examine and compare these approaches, particularly in two categories, namely graph-based clustering and subspace-based clustering. We will conduct and report experiments of the main clustering methods over a benchmark datasets.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Artificial Intelligence

2303.08582

Country:

Africa > Middle East > Morocco (0.04)
Europe > France (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering with Simplicial Complexes

Reddy, Thummaluru Siddartha, Chepuri, Sundeep Prabhakar, Borgnat, Pierre

arXiv.org Artificial IntelligenceMar-14-2023

In this work, we propose a new clustering algorithm to group nodes in networks based on second-order simplices (aka filled triangles) to leverage higher-order network interactions. We define a simplicial conductance function, which on minimizing, yields an optimal partition with a higher density of filled triangles within the set while the density of filled triangles is smaller across the sets. To this end, we propose a simplicial adjacency operator that captures the relation between the nodes through second-order simplices. This allows us to extend the well-known Cheeger inequality to cluster a simplicial complex. Then, leveraging the Cheeger inequality, we propose the simplicial spectral clustering algorithm. We report results from numerical experiments on synthetic and real-world network data to demonstrate the efficacy of the proposed approach.

data mining, machine learning, spectral, (17 more...)

arXiv.org Artificial Intelligence

2303.07646

Country:

Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multiway clustering of 3-order tensor via affinity matrix

Andriantsiory, Dina Faneva, Geloun, Joseph Ben, Lebbah, Mustapha

arXiv.org Artificial IntelligenceMar-14-2023

We propose a new method of multiway clustering for 3-order tensors via affinity matrix (MCAM). Based on a notion of similarity between the tensor slices and the spread of information of each slice, our model builds an affinity/similarity matrix on which we apply advanced clustering methods. The combination of all clusters of the three modes delivers the desired multiway clustering. Finally, MCAM achieves competitive results compared with other known algorithms on synthetics and real datasets.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2303.07757

Country:

Europe > France > Île-de-France > Yvelines > Versailles (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

Add feedback

Upcycling Models under Domain and Category Shift

Qu, Sanqing, Zou, Tianpei, Roehrbein, Florian, Lu, Cewu, Chen, Guang, Tao, Dacheng, Jiang, Changjun

arXiv.org Artificial IntelligenceMar-13-2023

Deep neural networks (DNNs) often perform poorly in the presence of domain shift and category shift. How to upcycle DNNs and adapt them to the target task remains an important open problem. Unsupervised Domain Adaptation (UDA), especially recently proposed Source-free Domain Adaptation (SFDA), has become a promising technology to address this issue. Nevertheless, existing SFDA methods require that the source domain and target domain share the same label space, consequently being only applicable to the vanilla closed-set setting. In this paper, we take one step further and explore the Source-free Universal Domain Adaptation (SF-UniDA). The goal is to identify "known" data samples under both domain and category shift, and reject those "unknown" data samples (not present in source classes), with only the knowledge from standard pre-trained source model. To this end, we introduce an innovative global and local clustering learning technique (GLC). Specifically, we design a novel, adaptive one-vs-all global clustering algorithm to achieve the distinction across different target classes and introduce a local k-NN clustering strategy to alleviate negative transfer. We examine the superiority of our GLC on multiple benchmarks with different category shift scenarios, including partial-set, open-set, and open-partial-set DA. Remarkably, in the most challenging open-partial-set DA scenario, GLC outperforms UMAD by 14.8\% on the VisDA benchmark. The code is available at https://github.com/ispc-lab/GLC.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.0711

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Traffic Prediction with Transfer Learning: A Mutual Information-based Approach

Huang, Yunjie, Song, Xiaozhuang, Zhu, Yuanshao, Zhang, Shiyao, Yu, James J. Q.

arXiv.org Artificial IntelligenceMar-13-2023

In modern traffic management, one of the most essential yet challenging tasks is accurately and timely predicting traffic. It has been well investigated and examined that deep learning-based Spatio-temporal models have an edge when exploiting Spatio-temporal relationships in traffic data. Typically, data-driven models require vast volumes of data, but gathering data in small cities can be difficult owing to constraints such as equipment deployment and maintenance costs. To resolve this problem, we propose TrafficTL, a cross-city traffic prediction approach that uses big data from other cities to aid data-scarce cities in traffic prediction. Utilizing a periodicity-based transfer paradigm, it identifies data similarity and reduces negative transfer caused by the disparity between two data distributions from distant cities. In addition, the suggested method employs graph reconstruction techniques to rectify defects in data from small data cities. TrafficTL is evaluated by comprehensive case studies on three real-world datasets and outperforms the state-of-the-art baseline by around 8 to 25 percent.

data mining, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2303.07184

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback