AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Pattern Analysis of Money Flow in the Bitcoin Blockchain

Tovanich, Natkamon, Cazabet, Rémy

arXiv.org Artificial IntelligenceJul-15-2022

Bitcoin is the first and highest valued cryptocurrency that stores transactions in a publicly distributed ledger called the blockchain. Understanding the activity and behavior of Bitcoin actors is a crucial research topic as they are pseudonymous in the transaction network. In this article, we propose a method based on taint analysis to extract taint flows --dynamic networks representing the sequence of Bitcoins transferred from an initial source to other actors until dissolution. Then, we apply graph embedding methods to characterize taint flows. We evaluate our embedding method with taint flows from top mining pools and show that it can classify mining pools with high accuracy. We also found that taint flows from the same period show high similarity. Our work proves that tracing the money flows can be a promising approach to classifying source actors and characterizing different money flow patterns

actor, taint flow, transaction, (15 more...)

arXiv.org Artificial Intelligence

2207.07315

Country:

Europe > France (0.04)
North America > United States > Hawaii (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Strongly Augmented Contrastive Clustering

Deng, Xiaozhi, Huang, Dong, Chen, Ding-Hua, Wang, Chang-Dong, Lai, Jian-Huang

arXiv.org Artificial IntelligenceJul-14-2022

Deep clustering has attracted increasing attention in recent years due to its capability of joint representation learning and clustering via deep neural networks. In its latest developments, the contrastive learning has emerged as an effective technique to substantially enhance the deep clustering performance. However, the existing contrastive learning based deep clustering algorithms mostly focus on some carefully-designed augmentations (often with limited transformations to preserve the structure), referred to as weak augmentations, but cannot go beyond the weak augmentations to explore the more opportunities in stronger augmentations (with more aggressive transformations or even severe distortions). In this paper, we present an end-to-end deep clustering approach termed Strongly Augmented Contrastive Clustering (SACC), which extends the conventional two-augmentation-view paradigm to multiple views and jointly leverages strong and weak augmentations for strengthened deep clustering. Particularly, we utilize a backbone network with triply-shared weights, where a strongly augmented view and two weakly augmented views are incorporated. Based on the representations produced by the backbone, the weak-weak view pair and the strong-weak view pairs are simultaneously exploited for the instance-level contrastive learning (via an instance projector) and the cluster-level contrastive learning (via a cluster projector), which, together with the backbone, can be jointly optimized in a purely unsupervised manner. Experimental results on five challenging image datasets have shown the superiority of our SACC approach over the state-of-the-art. The code is available at https://github.com/dengxiaozhi/SACC.

artificial intelligence, augmentation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.0038

Country:

Asia > China > Guangdong Province > Guangzhou (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Insurgency as Complex Network: Image Co-Appearance and Hierarchy in the PKK

Ballinger, Ollie

arXiv.org Artificial IntelligenceJul-14-2022

Despite a growing recognition of the importance of insurgent group structure on conflict outcomes, there is very little empirical research thereon. Though this problem is rooted in the inaccessibility of data on militant group structure, insurgents frequently publish large volumes of image data on the internet. In this paper, I develop a new methodology that leverages this abundant but underutilized source of data by automating the creation of a social network graph based on co-appearance in photographs using deep learning. Using a trove of 19,115 obituary images published online by the PKK, a Kurdish militant group in Turkey, I demonstrate that an individual's centrality in the resulting co-appearance network is closely correlated with their rank in the insurgent group.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

2207.06946

Country:

Asia > Middle East > Republic of Türkiye (1.00)
Asia > Middle East > Syria (0.04)
Asia > Middle East > Iran (0.04)
(15 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Government > Regional Government > Asia Government > Middle East Government > Republic of Türkiye Government (1.00)
Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Explainable Intrusion Detection Systems (X-IDS): A Survey of Current Methods, Challenges, and Opportunities

Neupane, Subash, Ables, Jesse, Anderson, William, Mittal, Sudip, Rahimi, Shahram, Banicescu, Ioana, Seale, Maria

arXiv.org Artificial IntelligenceJul-13-2022

The application of Artificial Intelligence (AI) and Machine Learning (ML) to cybersecurity challenges has gained traction in industry and academia, partially as a result of widespread malware attacks on critical systems such as cloud infrastructures and government institutions. Intrusion Detection Systems (IDS), using some forms of AI, have received widespread adoption due to their ability to handle vast amounts of data with a high prediction accuracy. These systems are hosted in the organizational Cyber Security Operation Center (CSoC) as a defense tool to monitor and detect malicious network flow that would otherwise impact the Confidentiality, Integrity, and Availability (CIA). CSoC analysts rely on these systems to make decisions about the detected threats. However, IDSs designed using Deep Learning (DL) techniques are often treated as black box models and do not provide a justification for their predictions. This creates a barrier for CSoC analysts, as they are unable to improve their decisions based on the model's predictions. One solution to this problem is to design explainable IDS (X-IDS). This survey reviews the state-of-the-art in explainable AI (XAI) for IDS, its current challenges, and discusses how these challenges span to the design of an X-IDS. In particular, we discuss black box and white box approaches comprehensively. We also present the tradeoff between these approaches in terms of their performance and ability to produce explanations. Furthermore, we propose a generic architecture that considers human-in-the-loop which can be used as a guideline when designing an X-IDS. Research recommendations are given from three critical viewpoints: the need to define explainability for IDS, the need to create explanations tailored to various stakeholders, and the need to design metrics to evaluate explanations.

data mining, explanation, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2207.06236

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Mississippi > Warren County > Vicksburg (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(9 more...)

Add feedback

Multiple Kernel Clustering with Dual Noise Minimization

Zhang, Junpu, Li, Liang, Wang, Siwei, Liu, Jiyuan, Liu, Yue, Liu, Xinwang, Zhu, En

arXiv.org Artificial IntelligenceJul-13-2022

Clustering is a representative unsupervised method widely applied in multi-modal and multi-view scenarios. Multiple kernel clustering (MKC) aims to group data by integrating complementary information from base kernels. As a representative, late fusion MKC first decomposes the kernels into orthogonal partition matrices, then learns a consensus one from them, achieving promising performance recently. However, these methods fail to consider the noise inside the partition matrix, preventing further improvement of clustering performance. We discover that the noise can be disassembled into separable dual parts, i.e. N-noise and C-noise (Null space noise and Column space noise). In this paper, we rigorously define dual noise and propose a novel parameter-free MKC algorithm by minimizing them. To solve the resultant optimization problem, we design an efficient two-step iterative strategy. To our best knowledge, it is the first time to investigate dual noise within the partition in the kernel space. We observe that dual noise will pollute the block diagonal structures and incur the degeneration of clustering performance, and C-noise exhibits stronger destruction than N-noise. Owing to our efficient mechanism to minimize dual noise, the proposed algorithm surpasses the recent methods by large margins.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3503161.3548334

2207.06041

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
Asia > China > Hunan Province > Changsha (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Synthesis of Parametric Hybrid Automata from Time Series

Soto, Miriam García, Henzinger, Thomas A., Schilling, Christian

arXiv.org Artificial IntelligenceJul-13-2022

We propose an algorithmic approach for synthesizing linear hybrid automata from time-series data. Unlike existing approaches, our approach provides a whole family of models. Each model in the family is guaranteed to capture the input data up to a precision error {\epsilon}, in the following sense: For each time series, the model contains an execution that is {\epsilon}-close to the data points. Our construction allows to effectively choose a model from this family with minimal precision error {\epsilon}. We demonstrate the algorithm's efficiency and its ability to find precise models in two case studies.

artificial intelligence, machine learning, time sery, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-19992-9_22

2208.06383

Country:

Europe > Denmark > North Jutland > Aalborg (0.04)
Europe > Austria (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Understanding Mean Shift Clustering(Artficial Intelligence)

#artificialintelligenceJul-10-2022, 16:15:33 GMT

Abstract: In this study, a novel method for the construction of a driving cycle based on Mean Shift clustering is proposed to solve the problems existing in the traditional micro-trips method. Firstly, 1701 kinematic segments are obtained by processing and dividing the driving data in real road conditions. Secondly, 12 kinematic parameters are calculated for each segment, and the dimensionality of parameters is reduced through principal component analysis (PCA). Three principal components are chosen to classify all cycles into three types by the Mean Shift algorithm. Finally, according to the principle of minimum deviation, representative micro-trips are selected from each type of cycle to complete the construction of the final driving cycle.

algorithm, artficial intelligence, construction method, (9 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.39)

Add feedback

Fuzzy Clustering by Hyperbolic Smoothing

Masis, David, Segura, Esteban, Trejos, Javier, Xavier, Adilson

arXiv.org Machine LearningJul-9-2022

We propose a novel method for building fuzzy clusters of large data sets, using a smoothing numerical approach. The usual sum-of-squares criterion is relaxed so the search for good fuzzy partitions is made on a continuous space, rather than a combinatorial space as in classical methods \cite{Hartigan}. The smoothing allows a conversion from a strongly non-differentiable problem into differentiable subproblems of optimization without constraints of low dimension, by using a differentiable function of infinite class. For the implementation of the algorithm we used the statistical software $R$ and the results obtained were compared to the traditional fuzzy $C$--means method, proposed by Bezdek.

artificial intelligence, fuzzy clustering, machine learning, (16 more...)

arXiv.org Machine Learning

2207.04261

Country:

Europe > Austria > Vienna (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
(6 more...)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)

Add feedback

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Huang, Wen-Chin, Yang, Shu-Wen, Hayashi, Tomoki, Toda, Tomoki

arXiv.org Artificial IntelligenceJul-9-2022

We present a large-scale comparative study of self-supervised speech representation (S3R)-based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive owing to their potential to replace expensive supervised representations such as phonetic posteriorgrams (PPGs), which are commonly adopted by state-of-the-art VC systems. Using S3PRL-VC, an open-source VC software we previously developed, we provide a series of in-depth objective and subjective analyses under three VC settings: intra-/cross-lingual any-to-one (A2O) and any-to-any (A2A) VC, using the voice conversion challenge 2020 (VCC2020) dataset. We investigated S3R-based VC in various aspects, including model type, multilinguality, and supervision. We also studied the effect of a post-discretization process with k-means clustering and showed how it improves in the A2A setting. Finally, the comparison with state-of-the-art VC systems demonstrates the competitiveness of S3R-based VC and also sheds light on the possible improving directions.

machine learning, natural language, proc, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JSTSP.2022.3193761

2207.04356

Country:

North America > United States (0.28)
Asia > Japan (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Few-Example Clustering via Contrastive Learning

Jang, Minguk, Chung, Sae-Young

arXiv.org Artificial IntelligenceJul-8-2022

We propose Few-Example Clustering (FEC), a In this paper, we propose Few-Example Clustering (FEC), a novel algorithm that performs contrastive learning novel clustering algorithm based on the hypothesis that the to cluster few examples. Our method is composed contrastive learner with the ground-truth cluster assignment of the following three steps: (1) generation of candidate is trained faster than the others. This hypothesis is built on cluster assignments, (2) contrastive learning the phenomenon that deep neural networks initially learn for each cluster assignment, and (3) selection patterns from the training examples. FEC is composed of of the best candidate. Based on the hypothesis the following three steps (see Figure 1): (1) generation of that the contrastive learner with the ground-truth candidate cluster assignments, (2) contrastive learning for cluster assignment is trained faster than the others, each cluster assignment, and (3) selection of the best candidate.

artificial intelligence, cluster assignment, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.0405

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback