Clustering
Trajectory Clustering Performance Evaluation: If we know the answer, it's not clustering
Rezaie, Mohsen, Saunier, Nicolas
Advancements in Intelligent Traffic Systems (ITS) have made huge amounts of traffic data available through automatic data collection. A big part of this data is stored as trajectories of moving vehicles and road users. Automatic analysis of this data with minimal human supervision would both lower the costs and eliminate subjectivity of the analysis. Trajectory clustering is an unsupervised task. In this paper, we perform a comprehensive comparison of similarity measures, clustering algorithms and evaluation measures using trajectory data from seven intersections. We also propose a method to automatically generate trajectory reference clusters based on their origin and destination points to be used for label-based evaluation measures. Therefore, the entire procedure remains unsupervised both in clustering and evaluation levels. Finally, we use a combination of evaluation measures to find the top performing similarity measures and clustering algorithms for each intersection. The results show that there is no single combination of distance and clustering algorithm that is always among the top ten clustering setups.
Computing Class Hierarchies from Classifiers
A class or taxonomic hierarchy is often manually constructed, and part of our knowledge about the world. In this paper, we propose a novel algorithm for automatically acquiring a class hierarchy from a classifier which is often a large neural network these days. The information that we need from a classifier is its confusion matrix which contains, for each pair of base classes, the number of errors the classifier makes by mistaking one for another. Our algorithm produces surprisingly good hierarchies for some well-known deep neural network models trained on the CIFAR-10 dataset, a neural network model for predicting the native language of a non-native English speaker, a neural network model for detecting the language of a written text, and a classifier for identifying music genre. In the literature, such class hierarchies have been used to provide interpretability to the neural networks. We also discuss some other potential uses of the acquired hierarchies.
Easy Semantification of Bioassays
Anteghini, Marco, D'Souza, Jennifer, Santos, Vitor A. P. Martins dos, Auer, Sören
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F 1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.
Incomplete Multi-view Clustering via Cross-view Relation Transfer
Wang, Yiming, Chang, Dongxia, Fu, Zhiqiang, Zhao, Yao
In this paper, we consider the problem of multi-view clustering on incomplete views. Compared with complete multi-view clustering, the view-missing problem increases the difficulty of learning common representations from different views. To address the challenge, we propose a novel incomplete multi-view clustering framework, which incorporates cross-view relation transfer and multi-view fusion learning. Specifically, based on the consistency existing in multi-view data, we devise a cross-view relation transfer-based completion module, which transfers known similar inter-instance relationships to the missing view and recovers the missing data via graph networks based on the transferred relationship graph. Then the view-specific encoders are designed to extract the recovered multi-view data, and an attention-based fusion layer is introduced to obtain the common representation. Moreover, to reduce the impact of the error caused by the inconsistency between views and obtain a better clustering structure, a joint clustering layer is introduced to optimize recovery and clustering simultaneously. Extensive experiments conducted on several real datasets demonstrate the effectiveness of the proposed method.
Outlier Detection using AI: A Survey
Sikder, Md Nazmul Kabir, Batarseh, Feras A.
An outlier is an event or observation that is defined as an unusual activity, intrusion, or a suspicious data point that lies at an irregular distance from a population. The definition of an outlier event, however, is subjective and depends on the application and the domain (Energy, Health, Wireless Network, etc.). It is important to detect outlier events as carefully as possible to avoid infrastructure failures because anomalous events can cause minor to severe damage to infrastructure. For instance, an attack on a cyber-physical system such as a microgrid may initiate voltage or frequency instability, thereby damaging a smart inverter which involves very expensive repairing. Unusual activities in microgrids can be mechanical faults, behavior changes in the system, human or instrument errors or a malicious attack. Accordingly, and due to its variability, Outlier Detection (OD) is an ever-growing research field. In this chapter, we discuss the progress of OD methods using AI techniques. For that, the fundamental concepts of each OD model are introduced via multiple categories. Broad range of OD methods are categorized into six major categories: Statistical-based, Distance-based, Density-based, Clustering-based, Learning-based, and Ensemble methods. For every category, we discuss recent state-of-the-art approaches, their application areas, and performances. After that, a brief discussion regarding the advantages, disadvantages, and challenges of each technique is provided with recommendations on future research directions. This survey aims to guide the reader to better understand recent progress of OD methods for the assurance of AI.
Fast Topological Clustering with Wasserstein Distance
Songdechakraiwut, Tananun, Krause, Bryan M., Banks, Matthew I., Nourski, Kirill V., Van Veen, Barry D.
The topological patterns exhibited by many real-world networks motivate the development of topology-based methods for assessing the similarity of networks. However, extracting topological structure is difficult, especially for large and dense networks whose node degrees range over multiple orders of magnitude. In this paper, we propose a novel and computationally practical topological clustering method that clusters complex networks with intricate topology using principled theory from persistent homology and optimal transport. Such networks are aggregated into clusters through a centroid-based clustering strategy based on both their topological and geometric structure, preserving correspondence between nodes in different networks. The notions of topological proximity and centroid are characterized using a novel and efficient approach to computation of the Wasserstein distance and barycenter for persistence barcodes associated with connected components and cycles. The proposed method is demonstrated to be effective using both simulated networks and measured functional brain networks.
A Comprehensive Survey on the Convergence of Vehicular Social Networks and Fog Computing
Miri, Farimasadat, Pazzi, Richard
In recent years, the number of IoT devices has been growing fast which leads to a challenging task for managing, storing, analyzing, and making decisions about raw data from different IoT devices, especially for delay-sensitive applications. In a vehicular network (VANET) environment, the dynamic nature of vehicles makes the current open research issues even more challenging due to the frequent topology changes that can lead to disconnections between vehicles. To this end, a number of research works have been proposed in the context of cloud and fog computing over the 5G infrastructure. On the other hand, there are a variety of research proposals that aim to extend the connection time between vehicles. Vehicular Social Networks (VSNs) have been defined to decrease the burden of connection time between the vehicles. This survey paper first provides the necessary background information and definitions about fog, cloud and related paradigms such as 5G and SDN. Then, it introduces the reader to Vehicular Social Networks, the different metrics and the main differences between VSNs and Online Social Networks. Finally, this survey investigates the related works in the context of VANETs that have demonstrated different architectures to address the different issues in fog computing. Moreover, it provides a categorization of the different approaches and discusses the required metrics in the context of fog and cloud and compares them to Vehicular social networks. A comparison of the relevant related works is discussed along with new research challenges and trends in the domain of VSNs and fog computing.
Understanding your performance metrics for clustering
Clustering is categorized under unsupervised learning, which forms the niche part of machine learning. Unlike supervised learning which is more common in most common machine learning study, classification tasks learn from the provided labeled data and makes class predictions. However, this does not cause the clustering method to be less desirable, as clustering algorithms are essential in discovering unexplored insights. Thus, it is important to understand the performance of the clustering task and to decide whether the clusters formed are trustable. Silhouette Analysis is the most common method as it is more straightforward compared to others.
Transformed K-means Clustering
Goel, Anurag, Majumdar, Angshul
In this work we propose a clustering framework based on the paradigm of transform learning. In simple terms the representation from transform learning is used for K-means clustering; however, the problem is not solved in such a na\"ive piecemeal fashion. The K-means clustering loss is embedded into the transform learning framework and the joint problem is solved using the alternating direction method of multipliers. Results on document clustering show that our proposed approach improves over the state-of-the-art.
Natural Language Processing in-and-for Design Research
Siddharth, L, Blessing, Lucienne T. M., Luo, Jianxi
We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.