AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Robust Detection of Lead-Lag Relationships in Lagged Multi-Factor Models

Zhang, Yichi, Cucuringu, Mihai, Shestopaloff, Alexander Y., Zohren, Stefan

arXiv.org Machine LearningSep-18-2023

In multivariate time series systems, key insights can be obtained by discovering lead-lag relationships inherent in the data, which refer to the dependence between two time series shifted in time relative to one another, and which can be leveraged for the purposes of control, forecasting or clustering. We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models. Within our framework, the envisioned pipeline takes as input a set of time series, and creates an enlarged universe of extracted subsequence time series from each input time series, via a sliding window approach. This is then followed by an application of various clustering techniques, (such as k-means++ and spectral clustering), employing a variety of pairwise similarity measures, including nonlinear ones. Once the clusters have been extracted, lead-lag estimates across clusters are robustly aggregated to enhance the identification of the consistent relationships in the original universe. We establish connections to the multireference alignment problem for both the homogeneous and heterogeneous settings. Since multivariate time series are ubiquitous in a wide range of domains, we demonstrate that our method is not only able to robustly detect lead-lag relationships in financial markets, but can also yield insightful results when applied to an environmental data set.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2305.06704

Country:

North America > United States (0.14)
Europe > Sweden (0.14)
Europe > Germany (0.14)
(2 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Government (1.00)
Energy > Oil & Gas (1.00)
Banking & Finance > Trading (1.00)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A Modular Spatial Clustering Algorithm with Noise Specification

K, Akhil, R, Srikanth H

arXiv.org Artificial IntelligenceSep-18-2023

Clustering techniques have been the key drivers of data mining, machine learning and pattern recognition for decades. One of the most popular clustering algorithms is DBSCAN due to its high accuracy and noise tolerance. Many superior algorithms such as DBSCAN have input parameters that are hard to estimate. Therefore, finding those parameters is a time consuming process. In this paper, we propose a novel clustering algorithm Bacteria-Farm, which balances the performance and ease of finding the optimal parameters for clustering. Bacteria- Farm algorithm is inspired by the growth of bacteria in closed experimental farms - their ability to consume food and grow - which closely represents the ideal cluster growth desired in clustering algorithms. In addition, the algorithm features a modular design to allow the creation of versions of the algorithm for specific tasks / distributions of data. In contrast with other clustering algorithms, our algorithm also has a provision to specify the amount of noise to be excluded during clustering.

algorithm, centroid, front-runner, (14 more...)

arXiv.org Artificial Intelligence

2309.10047

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering of Urban Traffic Patterns by K-Means and Dynamic Time Warping: Case Study

Etemad, Sadegh, Mosayebi, Raziyeh, Khodavirdian, Tadeh Alexani, Dastan, Elahe, Telmadarreh, Amir Salari, Jafari, Mohammadreza, Rafiei, Sepehr

arXiv.org Artificial IntelligenceSep-18-2023

Clustering of urban traffic patterns is an essential task in many different areas of traffic management and planning. In this paper, two significant applications in the clustering of urban traffic patterns are described. The first application estimates the missing speed values using the speed of road segments with similar traffic patterns to colorify map tiles. The second one is the estimation of essential road segments for generating addresses for a local point on the map, using the similarity patterns of different road segments. The speed time series extracts the traffic pattern in different road segments. In this paper, we proposed the time series clustering algorithm based on K-Means and Dynamic Time Warping. The case study of our proposed algorithm is based on the Snapp application's driver speed time series data. The results of the two applications illustrate that the proposed method can extract similar urban traffic patterns.

road segment, time sery, traffic pattern, (12 more...)

arXiv.org Artificial Intelligence

2309.0983

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.07)
Asia > Middle East > Iran > Kurdistan Province (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Genre: Research Report > New Finding (0.69)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multi-turn Dialogue Comprehension from a Topic-aware Perspective

Ma, Xinbei, Xu, Yi, Zhao, Hai, Zhang, Zhuosheng

arXiv.org Artificial IntelligenceSep-18-2023

Dialogue related Machine Reading Comprehension requires language models to effectively decouple and model multi-turn dialogue passages. As a dialogue development goes after the intentions of participants, its topic may not keep constant through the whole passage. Hence, it is non-trivial to detect and leverage the topic shift in dialogue modeling. Topic modeling, although has been widely studied in plain text, deserves far more utilization in dialogue reading comprehension. This paper proposes to model multi-turn dialogues from a topic-aware perspective. We start with a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. Then we use these fragments as topic-aware language processing units in further dialogue comprehension. On one hand, the split segments indict specific topics rather than mixed intentions, thus showing convenient on in-domain topic detection and location. For this task, we design a clustering system with a self-training auto-encoder, and we build two constructed datasets for evaluation. On the other hand, the split segments are an appropriate element of multi-turn dialogue response selection. For this purpose, we further present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements and matches response candidates with a dual cross-attention. Empirical studies on three public benchmarks show great improvements over baselines. Our work continues the previous studies on document topic, and brings the dialogue modeling to a novel topic-aware perspective with exhaustive experiments and analyses.

dataset, dialogue, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2309.09666

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

Tseng, Ching-Hsun, Lee, Shin-Jye, Cheng, Po-Wei, Lee, Chien, Hung, Chih-Chieh

arXiv.org Artificial IntelligenceSep-18-2023

Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token conditional probability in LDA, we can know the most possible or essential topic. However, the results are not intuitive because the given topics cannot wholly fit human knowledge. LDA offers the first possible relevant keywords, which also brings out another problem of whether the connection is reliable based on the statistic possibility. It is also hard to decide the topic number manually in advance. As the booming trend of using fuzzy membership to cluster and using transformers to embed words, this work presents the fuzzy topic modeling based on soft clustering and document embedding from state-of-the-art transformer-based model. In our practical application in a press release monitoring, the fuzzy topic modeling gives a more natural result than the traditional output from LDA.

dimension, topic number, transformer, (13 more...)

arXiv.org Artificial Intelligence

2309.09658

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

VULNERLIZER: Cross-analysis Between Vulnerabilities and Software Libraries

Pekaric, Irdin, Felderer, Michael, Steinmüller, Philipp

arXiv.org Artificial IntelligenceSep-18-2023

The identification of vulnerabilities is a continuous challenge in software projects. This is due to the evolution of methods that attackers employ as well as the constant updates to the software, which reveal additional issues. As a result, new and innovative approaches for the identification of vulnerable software are needed. In this paper, we present VULNERLIZER, which is a novel framework for cross-analysis between vulnerabilities and software libraries. It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries. In addition, the training of the model is conducted in order to reevaluate the generated associations. This is achieved by updating the assigned weights. Finally, the approach is then evaluated by making the predictions using the CVE data from the test set. The results show that the VULNERLIZER has a great potential in being able to predict future vulnerable libraries based on an initial input CVE entry or a software library. The trained model reaches a prediction accuracy of 75% or higher.

library, permutation, prediction, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.24251/HICSS.2021.843

2309.09649

Country:

Europe > Austria > Tyrol > Innsbruck (0.04)
Asia > Nepal (0.04)

Genre:

Research Report > New Finding (0.35)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Improving and Evaluating the Detection of Fragmentation in News Recommendations with the Clustering of News Story Chains

Polimeno, Alessandra, Reuver, Myrthe, Vrijenhoek, Sanne, Fokkens, Antske

arXiv.org Artificial IntelligenceSep-18-2023

News recommender systems play an increasingly influential role in shaping information access within democratic societies. However, tailoring recommendations to users' specific interests can result in the divergence of information streams. Fragmented access to information poses challenges to the integrity of the public sphere, thereby influencing democracy and public discourse. The Fragmentation metric quantifies the degree of fragmentation of information streams in news recommendations. Accurate measurement of this metric requires the application of Natural Language Processing (NLP) to identify distinct news events, stories, or timelines. This paper presents an extensive investigation of various approaches for quantifying Fragmentation in news recommendations. These approaches are evaluated both intrinsically, by measuring performance on news story clustering, and extrinsically, by assessing the Fragmentation scores of different simulated news recommender scenarios. Our findings demonstrate that agglomerative hierarchical clustering coupled with SentenceBERT text representation is substantially better at detecting Fragmentation than earlier implementations. Additionally, the analysis of simulated scenarios yields valuable insights and recommendations for stakeholders concerning the measurement and interpretation of Fragmentation.

evaluation, fragmentation, scenario, (17 more...)

arXiv.org Artificial Intelligence

2309.06192

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Fairness in Visual Clustering: A Novel Transformer Clustering Approach

Nguyen, Xuan-Bac, Duong, Chi Nhan, Savvides, Marios, Roy, Kaushik, Churchill, Hugh, Luu, Khoa

arXiv.org Artificial IntelligenceSep-18-2023

Promoting fairness for deep clustering models in unsupervised clustering settings to reduce demographic bias is a challenging goal. This is because of the limitation of large-scale balanced data with well-annotated labels for sensitive or protected attributes. In this paper, we first evaluate demographic bias in deep clustering models from the perspective of cluster purity, which is measured by the ratio of positive samples within a cluster to their correlation degree. This measurement is adopted as an indication of demographic bias. Then, a novel loss function is introduced to encourage a purity consistency for all clusters to maintain the fairness aspect of the learned clustering model. Moreover, we present a novel attention mechanism, Cross-attention, to measure correlations between multiple clusters, strengthening faraway positive samples and improving the purity of clusters during the learning process. Experimental results on a large-scale dataset with numerous attribute settings have demonstrated the effectiveness of the proposed approach on both clustering accuracy and fairness enhancement on several sensitive attributes.

bupt-in-the-wild, computer vision, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2304.07408

Country:

North America > United States > Arkansas > Washington County > Fayetteville (0.04)
North America > United States > North Carolina (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
(3 more...)

Add feedback

DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks

Huang, Dong, Chen, Ding-Hua, Chen, Xiangji, Wang, Chang-Dong, Lai, Jian-Huang

arXiv.org Artificial IntelligenceSep-17-2023

Deep clustering has recently emerged as a promising technique for complex data clustering. Despite the considerable progress, previous deep clustering works mostly build or learn the final clustering by only utilizing a single layer of representation, e.g., by performing the K-means clustering on the last fully-connected layer or by associating some clustering loss to a specific layer, which neglect the possibilities of jointly leveraging multi-layer representations for enhancing the deep clustering performance. In view of this, this paper presents a Deep Clustering via Ensembles (DeepCluE) approach, which bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks. In particular, we utilize a weight-sharing convolutional neural network as the backbone, which is trained with both the instance-level contrastive learning (via an instance projector) and the cluster-level contrastive learning (via a cluster projector) in an unsupervised manner. Thereafter, multiple layers of feature representations are extracted from the trained network, upon which the ensemble clustering process is further conducted. Specifically, a set of diversified base clusterings are generated from the multi-layer representations via a highly efficient clusterer. Then the reliability of clusters in multiple base clusterings is automatically estimated by exploiting an entropy-based criterion, based on which the set of base clusterings are re-formulated into a weighted-cluster bipartite graph. By partitioning this bipartite graph via transfer cut, the final consensus clustering can be obtained. Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.

ensemble, projector, representation, (16 more...)

arXiv.org Artificial Intelligence

2206.00359

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforcing POD-based model reduction techniques in reaction-diffusion complex networks using stochastic filtering and pattern recognition

Ajayakumar, Abhishek, Raha, Soumyendu

arXiv.org Artificial IntelligenceSep-16-2023

Complex networks are used to model many real-world systems. However, the dimensionality of these systems can make them challenging to analyze. Dimensionality reduction techniques like POD can be used in such cases. However, these models are susceptible to perturbations in the input data. We propose an algorithmic framework that combines techniques from pattern recognition (PR) and stochastic filtering theory to enhance the output of such models. The results of our study show that our method can improve the accuracy of the surrogate model under perturbed inputs. Deep Neural Networks (DNNs) are susceptible to adversarial attacks. However, recent research has revealed that Neural Ordinary Differential Equations (neural ODEs) exhibit robustness in specific applications. We benchmark our algorithmic framework with the neural ODE-based approach as a reference.

denote, differential equation, graph, (13 more...)

arXiv.org Artificial Intelligence

2307.09762

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback