AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Region2Vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions

Liang, Yunlei, Zhu, Jiawei, Ye, Wen, Gao, Song

arXiv.org Artificial IntelligenceOct-9-2022

Community Detection algorithms are used to detect densely connected components in complex networks and reveal underlying relationships among components. As a special type of networks, spatial networks are usually generated by the connections among geographic regions. Identifying the spatial network communities can help reveal the spatial interaction patterns, understand the hidden regional structures and support regional development decision-making. Given the recent development of Graph Convolutional Networks (GCN) and its powerful performance in identifying multi-scale spatial interactions, we proposed an unsupervised GCN-based community detection method "region2vec" on spatial networks. Our method first generates node embeddings for regions that share common attributes and have intense spatial interactions, and then applies clustering algorithms to detect communities based on their embedding similarity and spatial adjacency. Experimental results show that while existing methods trade off either attribute similarities or spatial interactions for one another, "region2vec" maintains a great balance between both and performs the best when one wants to maximize both attribute similarities and spatial interactions within communities.

data mining, machine learning, node, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3557915.3560974

2210.08041

Country:

North America > United States > Wisconsin > Dane County > Madison (0.15)
North America > United States > Washington > King County > Seattle (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.47)
Government (0.47)
Education (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

Modeling and Mining Multi-Aspect Graphs With Scalable Streaming Tensor Decomposition

Gujral, Ekta

arXiv.org Artificial IntelligenceOct-9-2022

Graphs emerge in almost every real-world application domain, ranging from online social networks all the way to health data and movie viewership patterns. Typically, such real-world graphs are big and dynamic, in the sense that they evolve over time. Furthermore, graphs usually contain multi-aspect information i.e. in a social network, we can have the "means of communication" between nodes, such as who messages whom, who calls whom, and who comments on whose timeline and so on. How can we model and mine useful patterns, such as communities of nodes in that graph, from such multi-aspect graphs? How can we identify dynamic patterns in those graphs, and how can we deal with streaming data, when the volume of data to be processed is very large? In order to answer those questions, in this thesis, we propose novel tensor-based methods for mining static and dynamic multi-aspect graphs. In general, a tensor is a higher-order generalization of a matrix that can represent high-dimensional multi-aspect data such as time-evolving networks, collaboration networks, and spatio-temporal data like Electroencephalography (EEG) brain measurements. The thesis is organized in two synergistic thrusts: First, we focus on static multi-aspect graphs, where the goal is to identify coherent communities and patterns between nodes by leveraging the tensor structure in the data. Second, as our graphs evolve dynamically, we focus on handling such streaming updates in the data without having to re-compute the decomposition, but incrementally update the existing results.

knowledge management, machine learning, simultaneous symmetric non-negative matrix trifactorization, (22 more...)

arXiv.org Artificial Intelligence

2210.04404

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Oregon (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
(17 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Media > Film (1.00)
(12 more...)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Knowledge Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(9 more...)

Add feedback

An Instance Selection Algorithm for Big Data in High imbalanced datasets based on LSH

Melo-Acosta, Germán E., Duitama-Muñoz, Freddy, Arias-Londoño, Julián D.

arXiv.org Artificial IntelligenceOct-9-2022

Training of Machine Learning (ML) models in real contexts often deals with big data sets and high-class imbalance samples where the class of interest is unrepresented (minority class). Practical solutions using classical ML models address the problem of large data sets using parallel/distributed implementations of training algorithms, approximate model-based solutions, or applying instance selection (IS) algorithms to eliminate redundant information. However, the combined problem of big and high imbalanced datasets has been less addressed. This work proposes three new methods for IS to be able to deal with large and imbalanced data sets. The proposed methods use Locality Sensitive Hashing (LSH) as a base clustering technique, and then three different sampling methods are applied on top of the clusters (or buckets) generated by LSH. The algorithms were developed in the Apache Spark framework, guaranteeing their scalability. The experiments carried out in three different datasets suggest that the proposed IS methods can improve the performance of a base ML model between 5% and 19% in terms of the geometric mean.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.0431

Country:

Europe > Spain > Galicia > Madrid (0.04)
South America > Colombia > Antioquia Department > Medellín (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Deep Clustering: A Comprehensive Survey

Ren, Yazhou, Pu, Jingyu, Yang, Zhimeng, Xu, Jie, Li, Guofeng, Pu, Xiaorong, Yu, Philip S., He, Lifang

arXiv.org Artificial IntelligenceOct-8-2022

Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2210.04142

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions

Chiaroni, Florent, Boudiaf, Malik, Mitiche, Amar, Ayed, Ismail Ben

arXiv.org Artificial IntelligenceOct-8-2022

We explore clustering the softmax predictions of deep neural networks and introduce a novel probabilistic clustering method, referred to as k-sBetas. In the general context of clustering discrete distributions, the existing methods focused on exploring distortion measures tailored to simplex data, such as the KL divergence, as alternatives to the standard Euclidean distance. We provide a general maximum a posteriori (MAP) perspective of clustering distributions, which emphasizes that the statistical models underlying the existing distortion-based methods may not be descriptive enough. Instead, we optimize a mixed-variable objective measuring the conformity of data within each cluster to the introduced sBeta density function, whose parameters are constrained and estimated jointly with binary assignment variables. Our versatile formulation approximates a variety of parametric densities for modeling simplex data, and enables to control the cluster-balance bias. This yields highly competitive performances for unsupervised adjustments of black-box model predictions in a variety of scenarios. Our code and comparisons with the existing simplex-clustering approaches along with our introduced softmax-prediction benchmarks are publicly available: https://github.com/fchiaroni/Clustering_Softmax_Predictions.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2208.00287

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Behaviour Analysis of News Consumption in Turkish Media

Makaroglu, Didem, Cakir, Altan, Toreyin, Behcet Ugur

arXiv.org Artificial IntelligenceOct-8-2022

Clickstream data, which come with a massive volume generated by human activities on websites, have become a prominent feature for identifying readers' characteristics by newsrooms after the digitization of news outlets. Although the nature of clickstream data has a similar logic within websites, it has inherent limitations in recognizing human behaviours when looking from a broad perspective, which brings the need to limit the problem in niche areas. This study investigates the anonymized readers' click activities on the organizations' websites to identify news consumption patterns following referrals from Twitter,who incidentally reach but propensity is mainly routed news content. Methodologies for ensemble cluster analysis with mixed-type embedding strategies are applied and compared to find similar reader groups and interests independent of time. Various internal validation perspectives are used to determine the optimality of the quality of clusters, where the Calinski Harabasz Index (CHI) is found to give a generalizable result. Our findings demonstrate that clustering a mixed-type dataset approaches the optimal internal validation scores, which we define to discriminate the clusters and algorithms considering applied strategies when embedded by Uniform Manifold Approximation and Projection (UMAP) and using a consensus function as a key to access the most applicable hyperparameter configurations in the given ensemble rather than using consensus function results directly. Evaluation of the resulting clusters highlights specific clusters repeatedly present in the separated monthly samples by Adjusted Mutual Information scores greater than 0.5, which provide insights to the news organizations and overcome the degradation of the modeling behaviours due to the change in the interest over time.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2202.02056

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
North America > United States > Hawaii (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Information Technology > Services (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

HAL-X: A Novel Clustering Algorithm for Rapid Single-Cell Data Analysis for Drug Discovery

#artificialintelligenceOct-7-2022, 07:01:04 GMT

The novel clustering algorithm HAL-x provides insights into single-cell clustering, bridging the biomedicine sciences for healthcare purposes. With its specificity and sensitivity, it allows the generation of multiple-clustering for in-depth analysis of high-dimensional data. The future directions of algorithms are extremely promising in understanding disease, disease progression, and drug discovery. Throughout the years, clustering has been studied both extensively and intensively in statistics and machine learning. Apart from classical methods, clustering for high-dimensional data is primarily done using non-linear embeddings, which largely lacked visualization and identification of accurate clustering.

algorithm, dataset, rapid single-cell data analysis, (9 more...)

#artificialintelligence

Country: North America > United States > Maryland (0.05)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.32)
Health & Medicine > Therapeutic Area > Immunology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks

Ly, Adrian, Dazeley, Richard, Vamplew, Peter, Cruz, Francisco, Aryal, Sunil

arXiv.org Artificial IntelligenceOct-7-2022

Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to successfully surpass human level performance in a number of Atari learning environments. However, divergent and unstable behaviour have been long standing issues in DQNs. The unstable behaviour is often characterised by overestimation in the $Q$-values, commonly referred to as the overestimation bias. To address the overestimation bias and the divergent behaviour, a number of heuristic extensions have been proposed. Notably, multi-step updates have been shown to drastically reduce unstable behaviour while improving agent's training performance. However, agents are often highly sensitive to the selection of the multi-step update horizon ($n$), and our empirical experiments show that a poorly chosen static value for $n$ can in many cases lead to worse performance than single-step DQN. Inspired by the success of $n$-step DQN and the effects that multi-step updates have on overestimation bias, this paper proposes a new algorithm that we call `Elastic Step DQN' (ES-DQN). It dynamically varies the step size horizon in multi-step updates based on the similarity of states visited. Our empirical evaluation shows that ES-DQN out-performs $n$-step with fixed $n$ updates, Double DQN and Average DQN in several OpenAI Gym environments while at the same time alleviating the overestimation bias.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2210.03325

Country: Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data

Ehghaghi, Malikeh, Rudzicz, Frank, Novikova, Jekaterina

arXiv.org Artificial IntelligenceOct-6-2022

A significant number of studies apply acoustic and linguistic characteristics of human speech as prominent markers of dementia and depression. However, studies on discriminating depression from dementia are rare. Co-morbid depression is frequent in dementia and these clinical conditions share many overlapping symptoms, but the ability to distinguish between depression and dementia is essential as depression is often curable. In this work, we investigate the ability of clustering approaches in distinguishing between depression and dementia from human speech. We introduce a novel aggregated dataset, which combines narrative speech data from multiple conditions, i.e., Alzheimer's disease, mild cognitive impairment, healthy control, and depression. We compare linear and non-linear clustering approaches and show that non-linear clustering techniques distinguish better between distinct disease clusters. Our interpretability analysis shows that the main differentiating symptoms between dementia and depression are acoustic abnormality, repetitiveness (or circularity) of speech, word finding difficulty, coherence impairment, and differences in lexical complexity and richness.

category, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.03303

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > France (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology > Dementia (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Reservoir Computing Approach for Gray Images Segmentation

Koprinkova-Hristova, Petia

arXiv.org Artificial IntelligenceOct-6-2022

The paper proposes a novel approach for gray scale images segmentation. It is based on multiple features extraction from single feature per image pixel, namely its intensity value, using Echo state network. The newly extracted features - reservoir equilibrium states - reveal hidden image characteristics that improve its segmentation via a clustering algorithm. Moreover, it was demonstrated that the intrinsic plasticity tuning of reservoir fits its equilibrium states to the original image intensity distribution thus allowing for its better segmentation. The proposed approach is tested on the benchmark image Lena.

artificial intelligence, machine learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/INISTA55318.2022.9894221

2107.11077

Country:

Europe > Ireland > Munster > County Kerry > Killarney (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
Europe > Belgium (0.04)

Genre:

Research Report > Promising Solution (0.35)
Overview > Innovation (0.35)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback