Goto

Collaborating Authors

 Clustering


Fast inference of latent space dynamics in huge relational event networks

arXiv.org Artificial Intelligence

Relational events are a type of social interactions, that sometimes are referred to as dynamic networks. Its dynamics typically depends on emerging patterns, so-called endogenous variables, or external forces, referred to as exogenous variables. Comprehensive information on the actors in the network, especially for huge networks, is rare, however. A latent space approach in network analysis has been a popular way to account for unmeasured covariates that are driving network configurations. Bayesian and EM-type algorithms have been proposed for inferring the latent space, but both the sheer size many social network applications as well as the dynamic nature of the process, and therefore the latent space, make computations prohibitively expensive. In this work we propose a likelihood-based algorithm that can deal with huge relational event networks. We propose a hierarchical strategy for inferring network community dynamics embedded into an interpretable latent space. Node dynamics are described by smooth spline processes. To make the framework feasible for large networks we borrow from machine learning optimization methodology. Model-based clustering is carried out via a convex clustering penalization, encouraging shared trajectories for ease of interpretation. We propose a model-based approach for separating macro-microstructures and perform a hierarchical analysis within successive hierarchies. The method can fit millions of nodes on a public Colab GPU in a few minutes. The code and a tutorial are available in a Github repository.


GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing

arXiv.org Artificial Intelligence

Most of the existing clustering methods are based on a single granularity of information, such as the distance and density of each data. This most fine-grained based approach is usually inefficient and susceptible to noise. Therefore, we propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST). We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority", which can greatly avoid the influence of outliers and accelerate the construction process of MST. Experimental results on several data sets demonstrate the power of the algorithm. All codes have been released at https://github.com/xjnine/GBMST.


Multi-View Clustering via Semi-non-negative Tensor Factorization

arXiv.org Artificial Intelligence

Multi-view clustering (MVC) based on non-negative matrix factorization (NMF) and its variants have received a huge amount of attention in recent years due to their advantages in clustering interpretability. However, existing NMF-based multi-view clustering methods perform NMF on each view data respectively and ignore the impact of between-view. Thus, they can't well exploit the within-view spatial structure and between-view complementary information. To resolve this issue, we present semi-non-negative tensor factorization (Semi-NTF) and develop a novel multi-view clustering based on Semi-NTF with one-side orthogonal constraint. Our model directly performs Semi-NTF on the 3rd-order tensor which is composed of anchor graphs of views. Thus, our model directly considers the between-view relationship. Moreover, we use the tensor Schatten p-norm regularization as a rank approximation of the 3rd-order tensor which characterizes the cluster structure of multi-view data and exploits the between-view complementary information. In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point. Extensive experiments on various benchmark datasets indicate that our proposed method is able to achieve satisfactory clustering performance.


Clustered Graph Matching for Label Recovery and Graph Classification

arXiv.org Artificial Intelligence

Given a collection of vertex-aligned networks and an additional label-shuffled network, we propose procedures for leveraging the signal in the vertex-aligned collection to recover the labels of the shuffled network. We consider matching the shuffled network to averages of the networks in the vertex-aligned collection at different levels of granularity. We demonstrate both in theory and practice that if the graphs come from different network classes, then clustering the networks into classes followed by matching the new graph to cluster-averages can yield higher fidelity matching performance than matching to the global average graph. Moreover, by minimizing the graph matching objective function with respect to each cluster average, this approach simultaneously classifies and recovers the vertex labels for the shuffled graph. These theoretical developments are further reinforced via an illuminating real data experiment matching human connectomes.


Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery

arXiv.org Artificial Intelligence

Discovering novel concepts from unlabelled data and in In this work we study the problem of Novel Class Discovery a continuous manner is an important desideratum of lifelong (NCD) [19] where the goal is to train neural networks learners. In the literature such problems have been to discover (or group) novel visual concepts present partially addressed under very restricted settings, where in an unlabelled dataset into semantically meaningful clusters, either access to labelled data is provided for discovering while leveraging prior knowledge learned from supervised novel concepts (e.g., NCD) or learning occurs for a limited pre-training on a labelled dataset containing disjoint number of incremental steps (e.g., class-iNCD). In this work classes (see Fig 1b). Note that NCD is different from fully we challenge the status quo and propose a more challenging unsupervised clustering as there can be several criteria to and practical learning paradigm called MSc-iNCD, where cluster a dataset unsupervisedly (see Figure 1a). Ever since learning occurs continuously and unsupervisedly, while exploiting the pioneering work by Han et al., [19] the field of NCD has the rich priors from large-scale pre-trained models.


Smart Home Environment Modelled with a Multi-Agent System

arXiv.org Artificial Intelligence

A smart home can be considered a place of residence that enables the management of appliances and systems to help with day-to-day life by automated technology. In the current paper is described a prototype that simulates a contextaware environment, developed in a designed smart home. The smart home environment has been simulated using three agents and five locations in a house. The context-aware agents behave based on predefined rules designed for daily activities. Our proposal aims to reduce operational cost of running devices. In the future, monitors of health aspects belonging to home residents will sustain their healthy life daily. Keywords: smart home, multi-agent system, K-Nearest Neighbor algorithm, K-Means Clustering algorithm 1. Introduction Smart home, also known as an intelligent house, incorporates special devices that manage house features.


Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining

arXiv.org Artificial Intelligence

Dual encoder models are ubiquitous in modern classification and retrieval. Crucial for training such dual encoders is an accurate estimation of gradients from the partition function of the softmax over the large output space; this requires finding negative targets that contribute most significantly ("hard negatives"). Since dual encoder model parameters change during training, the use of traditional static nearest neighbor indexes can be sub-optimal. These static indexes (1) periodically require expensive re-building of the index, which in turn requires (2) expensive re-encoding of all targets using updated model parameters. This paper addresses both of these challenges. First, we introduce an algorithm that uses a tree structure to approximate the softmax with provable bounds and that dynamically maintains the tree. Second, we approximate the effect of a gradient update on target encodings with an efficient Nystrom low-rank approximation. In our empirical study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining. Furthermore, our method surpasses prior state-of-the-art while using 150x less accelerator memory.


CGC: Contrastive Graph Clustering for Community Detection and Tracking

arXiv.org Artificial Intelligence

Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering perspective. Recently, state-of-the-art clustering performance in various domains has been achieved by deep clustering methods. Especially, deep graph clustering (DGC) methods have successfully extended deep clustering to graph-structured data by learning node representations and cluster assignments in a joint optimization framework. Despite some differences in modeling choices (e.g., encoder architectures), existing DGC methods are mainly based on autoencoders and use the same clustering objective with relatively minor adaptations. Also, while many real-world graphs are dynamic, previous DGC methods considered only static graphs. In this work, we develop CGC, a novel end-to-end framework for graph clustering, which fundamentally differs from existing methods. CGC learns node embeddings and cluster assignments in a contrastive graph learning framework, where positive and negative samples are carefully selected in a multi-level scheme such that they reflect hierarchical community structures and network homophily. Also, we extend CGC for time-evolving data, where temporal graph clustering is performed in an incremental learning fashion, with the ability to detect change points. Extensive evaluation on real-world graphs demonstrates that the proposed CGC consistently outperforms existing methods.


Temporal Egonet Subgraph Transitions

arXiv.org Artificial Intelligence

How do we summarize dynamic behavioral interactions? TEST is flexible enough to lend itself to a variety of standard We introduce a possible node-embedding-based solution to tasks and application domains because it does not depend this question: temporal egonet subgraph transitions.


Hybrid Fuzzy-Crisp Clustering Algorithm: Theory and Experiments

arXiv.org Artificial Intelligence

With the membership function being strictly positive, the conventional fuzzy c-means clustering method sometimes causes imbalanced influence when clusters of vastly different sizes exist. That is, an outstandingly large cluster drags to its center all the other clusters, however far they are separated. To solve this problem, we propose a hybrid fuzzy-crisp clustering algorithm based on a target function combining linear and quadratic terms of the membership function. In this algorithm, the membership of a data point to a cluster is automatically set to exactly zero if the data point is ``sufficiently'' far from the cluster center. In this paper, we present a new algorithm for hybrid fuzzy-crisp clustering along with its geometric interpretation. The algorithm is tested on twenty simulated data generated and five real-world datasets from the UCI repository and compared with conventional fuzzy and crisp clustering methods. The proposed algorithm is demonstrated to outperform the conventional methods on imbalanced datasets and can be competitive on more balanced datasets.