AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Trajectory Clustering Performance Evaluation: If we know the answer, it's not clustering

Rezaie, Mohsen, Saunier, Nicolas

arXiv.org Artificial IntelligenceDec-2-2021

Advancements in Intelligent Traffic Systems (ITS) have made huge amounts of traffic data available through automatic data collection. A big part of this data is stored as trajectories of moving vehicles and road users. Automatic analysis of this data with minimal human supervision would both lower the costs and eliminate subjectivity of the analysis. Trajectory clustering is an unsupervised task. In this paper, we perform a comprehensive comparison of similarity measures, clustering algorithms and evaluation measures using trajectory data from seven intersections. We also propose a method to automatically generate trajectory reference clusters based on their origin and destination points to be used for label-based evaluation measures. Therefore, the entire procedure remains unsupervised both in clustering and evaluation levels. Finally, we use a combination of evaluation measures to find the top performing similarity measures and clustering algorithms for each intersection. The results show that there is no single combination of distance and clustering algorithm that is always among the top ten clustering setups.

algorithm, performance measure, trajectory, (13 more...)

arXiv.org Artificial Intelligence

2112.0157

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Transportation > Ground > Road (0.93)
Government (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Computing Class Hierarchies from Classifiers

Kang, Kai, Lin, Fangzhen

arXiv.org Artificial IntelligenceDec-2-2021

A class or taxonomic hierarchy is often manually constructed, and part of our knowledge about the world. In this paper, we propose a novel algorithm for automatically acquiring a class hierarchy from a classifier which is often a large neural network these days. The information that we need from a classifier is its confusion matrix which contains, for each pair of base classes, the number of errors the classifier makes by mistaking one for another. Our algorithm produces surprisingly good hierarchies for some well-known deep neural network models trained on the CIFAR-10 dataset, a neural network model for predicting the native language of a non-native English speaker, a neural network model for detecting the language of a written text, and a classifier for identifying music genre. In the literature, such class hierarchies have been used to provide interpretability to the neural networks. We also discuss some other potential uses of the acquired hierarchies.

algorithm, class hierarchy, hierarchy, (16 more...)

arXiv.org Artificial Intelligence

2112.01187

Country:

Asia > China > Hong Kong (0.04)
Asia > India (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Easy Semantification of Bioassays

Anteghini, Marco, D'Souza, Jennifer, Santos, Vitor A. P. Martins dos, Auer, Sören

arXiv.org Artificial IntelligenceDec-2-2021

Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F 1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.

bioassay, logical statement, semantification, (14 more...)

arXiv.org Artificial Intelligence

2111.15182

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Communications > Web > Semantic Web (0.86)
(2 more...)

Add feedback

Incomplete Multi-view Clustering via Cross-view Relation Transfer

Wang, Yiming, Chang, Dongxia, Fu, Zhiqiang, Zhao, Yao

arXiv.org Artificial IntelligenceDec-1-2021

In this paper, we consider the problem of multi-view clustering on incomplete views. Compared with complete multi-view clustering, the view-missing problem increases the difficulty of learning common representations from different views. To address the challenge, we propose a novel incomplete multi-view clustering framework, which incorporates cross-view relation transfer and multi-view fusion learning. Specifically, based on the consistency existing in multi-view data, we devise a cross-view relation transfer-based completion module, which transfers known similar inter-instance relationships to the missing view and recovers the missing data via graph networks based on the transferred relationship graph. Then the view-specific encoders are designed to extract the recovered multi-view data, and an attention-based fusion layer is introduced to obtain the common representation. Moreover, to reduce the impact of the error caused by the inconsistency between views and obtain a better clustering structure, a joint clustering layer is introduced to optimize recovery and clustering simultaneously. Extensive experiments conducted on several real datasets demonstrate the effectiveness of the proposed method.

artificial intelligence, clustering, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCSVT.2022.3201822

2112.00739

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Outlier Detection using AI: A Survey

Sikder, Md Nazmul Kabir, Batarseh, Feras A.

arXiv.org Artificial IntelligenceDec-1-2021

An outlier is an event or observation that is defined as an unusual activity, intrusion, or a suspicious data point that lies at an irregular distance from a population. The definition of an outlier event, however, is subjective and depends on the application and the domain (Energy, Health, Wireless Network, etc.). It is important to detect outlier events as carefully as possible to avoid infrastructure failures because anomalous events can cause minor to severe damage to infrastructure. For instance, an attack on a cyber-physical system such as a microgrid may initiate voltage or frequency instability, thereby damaging a smart inverter which involves very expensive repairing. Unusual activities in microgrids can be mechanical faults, behavior changes in the system, human or instrument errors or a malicious attack. Accordingly, and due to its variability, Outlier Detection (OD) is an ever-growing research field. In this chapter, we discuss the progress of OD methods using AI techniques. For that, the fundamental concepts of each OD model are introduced via multiple categories. Broad range of OD methods are categorized into six major categories: Statistical-based, Distance-based, Density-based, Clustering-based, Learning-based, and Ensemble methods. For every category, we discuss recent state-of-the-art approaches, their application areas, and performances. After that, a brief discussion regarding the advantages, disadvantages, and challenges of each technique is provided with recommendations on future research directions. This survey aims to guide the reader to better understand recent progress of OD methods for the assurance of AI.

algorithm, detection, outlier, (13 more...)

arXiv.org Artificial Intelligence

2112.00588

Country:

North America > United States > Virginia (0.04)
North America > United States > Oregon (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.65)
Overview > Innovation (0.65)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Fast Topological Clustering with Wasserstein Distance

Songdechakraiwut, Tananun, Krause, Bryan M., Banks, Matthew I., Nourski, Kirill V., Van Veen, Barry D.

arXiv.org Machine LearningNov-30-2021

The topological patterns exhibited by many real-world networks motivate the development of topology-based methods for assessing the similarity of networks. However, extracting topological structure is difficult, especially for large and dense networks whose node degrees range over multiple orders of magnitude. In this paper, we propose a novel and computationally practical topological clustering method that clusters complex networks with intricate topology using principled theory from persistent homology and optimal transport. Such networks are aggregated into clusters through a centroid-based clustering strategy based on both their topological and geometric structure, preserving correspondence between nodes in different networks. The notions of topological proximity and centroid are characterized using a novel and efficient approach to computation of the Wasserstein distance and barycenter for persistence barcodes associated with connected components and cycles. The proposed method is demonstrated to be effective using both simulated networks and measured functional brain networks.

edge weight, topology, wasserstein distance, (14 more...)

arXiv.org Machine Learning

2112.00101

Country:

North America > United States > Iowa (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

A Comprehensive Survey on the Convergence of Vehicular Social Networks and Fog Computing

Miri, Farimasadat, Pazzi, Richard

arXiv.org Artificial IntelligenceNov-30-2021

In recent years, the number of IoT devices has been growing fast which leads to a challenging task for managing, storing, analyzing, and making decisions about raw data from different IoT devices, especially for delay-sensitive applications. In a vehicular network (VANET) environment, the dynamic nature of vehicles makes the current open research issues even more challenging due to the frequent topology changes that can lead to disconnections between vehicles. To this end, a number of research works have been proposed in the context of cloud and fog computing over the 5G infrastructure. On the other hand, there are a variety of research proposals that aim to extend the connection time between vehicles. Vehicular Social Networks (VSNs) have been defined to decrease the burden of connection time between the vehicles. This survey paper first provides the necessary background information and definitions about fog, cloud and related paradigms such as 5G and SDN. Then, it introduces the reader to Vehicular Social Networks, the different metrics and the main differences between VSNs and Online Social Networks. Finally, this survey investigates the related works in the context of VANETs that have demonstrated different architectures to address the different issues in fog computing. Moreover, it provides a categorization of the different approaches and discusses the required metrics in the context of fog and cloud and compares them to Vehicular social networks. A comparison of the relevant related works is discussed along with new research challenges and trends in the domain of VSNs and fog computing.

architecture, computing, node, (16 more...)

arXiv.org Artificial Intelligence

2112.00143

Country:

North America > Canada > Ontario (0.04)
Asia > China > Shanghai > Shanghai (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(6 more...)

Genre: Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)

Add feedback

Understanding your performance metrics for clustering

#artificialintelligenceNov-29-2021, 07:30:27 GMT

Clustering is categorized under unsupervised learning, which forms the niche part of machine learning. Unlike supervised learning which is more common in most common machine learning study, classification tasks learn from the provided labeled data and makes class predictions. However, this does not cause the clustering method to be less desirable, as clustering algorithms are essential in discovering unexplored insights. Thus, it is important to understand the performance of the clustering task and to decide whether the clusters formed are trustable. Silhouette Analysis is the most common method as it is more straightforward compared to others.

performance metric, sample point, silhouette score, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.57)

Add feedback

Transformed K-means Clustering

Goel, Anurag, Majumdar, Angshul

arXiv.org Machine LearningNov-27-2021

In this work we propose a clustering framework based on the paradigm of transform learning. In simple terms the representation from transform learning is used for K-means clustering; however, the problem is not solved in such a na\"ive piecemeal fashion. The K-means clustering loss is embedded into the transform learning framework and the joint problem is solved using the alternating direction method of multipliers. Results on document clustering show that our proposed approach improves over the state-of-the-art.

formulation, k-means, learning, (14 more...)

arXiv.org Machine Learning

2111.13921

Country:

Asia > India > NCT > New Delhi (0.05)
Asia > India > NCT > Delhi (0.05)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Natural Language Processing in-and-for Design Research

Siddharth, L, Blessing, Lucienne T. M., Luo, Jianxi

arXiv.org Artificial IntelligenceNov-27-2021

We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.

application, design process, ontology, (14 more...)

arXiv.org Artificial Intelligence

2111.13827

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
(16 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Law (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
(11 more...)

Add feedback