AITopics

doi: 10.1145/3626641.3627239

2401.03198

Country:

North America > United States > Oregon (0.06)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceJan-6-2024

Learning Persistent Community Structures in Dynamic Networks via Topological Data Analysis

Kong, Dexu, Zhang, Anping, Li, Yang

Dynamic community detection methods often lack effective mechanisms to ensure temporal consistency, hindering the analysis of network evolution. In this paper, we propose a novel deep graph clustering framework with temporal consistency regularization on inter-community structures, inspired by the concept of minimal network topological changes within short intervals. Specifically, to address the representation collapse problem, we first introduce MFC, a matrix factorization-based deep graph clustering algorithm that preserves node embedding. Based on static clustering results, we construct probabilistic community networks and compute their persistence homology, a robust topological measure, to assess structural similarity between them. Moreover, a novel neural network regularization TopoReg is introduced to ensure the preservation of topological similarity between inter-community structures over time intervals. Our approach enhances temporal consistency and clustering accuracy on real-world datasets with both fixed and varying numbers of communities. It is also a pioneer application of TDA in temporally persistent community detection, offering an insightful contribution to field of network analysis. Code and data are available at the public git repository: https://github.com/kundtx/MFC_TopoReg

dataset, graph, snapshot, (16 more...)

2401.03194

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Russo, Luigi, Mauro, Francesco, Memar, Babak, Sebastianelli, Alessandro, Gamba, Paolo, Ullo, Silvia Liberata

Using Multi-Temporal Sentinel-1 and Sentinel-2 data for water bodies mapping

Climate change is intensifying extreme weather events, causing both water scarcity and severe rainfall unpredictability, and posing threats to sustainable development, biodiversity, and access to water and sanitation. This paper aims to provide valuable insights for comprehensive water resource monitoring under diverse meteorological conditions. An extension of the SEN2DWATER dataset is proposed to enhance its capabilities for water basin segmentation. Through the integration of temporally and spatially aligned radar information from Sentinel-1 data with the existing multispectral Sentinel-2 data, a novel multisource and multitemporal dataset is generated. Benchmarking the enhanced dataset involves the application of indices such as the Soil Water Index (SWI) and Normalized Difference Water Index (NDWI), along with an unsupervised Machine Learning (ML) classifier (k-means clustering). Promising results are obtained and potential future developments and applications arising from this research are also explored.

dataset, sentinel-2, water body mapping, (11 more...)

2402.00023

Country:

Europe > Italy (0.05)
North America (0.04)
Asia > China (0.04)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.52)

Wehrli, Silvan, Arnrich, Bert, Irrgang, Christopher

German Text Embedding Clustering Benchmark

This work introduces a benchmark assessing the performance of clustering German text embeddings in different domains. This benchmark is driven by the increasing use of clustering neural text embeddings in tasks that require the grouping of texts (such as topic modeling) and the need for German resources in existing benchmarks. We provide an initial analysis for a range of pre-trained mono- and multilingual models evaluated on the outcome of different clustering algorithms. Results include strong performing mono- and multilingual models. Reducing the dimensions of embeddings can further improve clustering. Additionally, we conduct experiments with continued pre-training for German BERT models to estimate the benefits of this additional training. Our experiments suggest that significant performance improvements are possible for short text. All code and datasets are publicly available.

computational linguistic, dataset, proceedings, (16 more...)

2401.02709

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(14 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Homophily-Related: Adaptive Hybrid Graph Filter for Multi-View Graph Clustering

Wen, Zichen, Ling, Yawen, Ren, Yazhou, Wu, Tianyi, Chen, Jianpeng, Pu, Xiaorong, Hao, Zhifeng, He, Lifang

Recently there is a growing focus on graph data, and multi-view graph clustering has become a popular area of research interest. Most of the existing methods are only applicable to homophilous graphs, yet the extensive real-world graph data can hardly fulfill the homophily assumption, where the connected nodes tend to belong to the same class. Several studies have pointed out that the poor performance on heterophilous graphs is actually due to the fact that conventional graph neural networks (GNNs), which are essentially low-pass filters, discard information other than the low-frequency information on the graph. Nevertheless, on certain graphs, particularly heterophilous ones, neglecting high-frequency information and focusing solely on low-frequency information impedes the learning of node representations. To break this limitation, our motivation is to perform graph filtering that is closely related to the homophily degree of the given graph, with the aim of fully leveraging both low-frequency and high-frequency signals to learn distinguishable node embedding. In this work, we propose Adaptive Hybrid Graph Filter for Multi-View Graph Clustering (AHGFC). Specifically, a graph joint process and graph joint aggregation matrix are first designed by using the intrinsic node features and adjacency relationship, which makes the low and high-frequency signals on the graph more distinguishable. Then we design an adaptive hybrid graph filter that is related to the homophily degree, which learns the node embedding based on the graph joint aggregation matrix. After that, the node embedding of each view is weighted and fused into a consensus embedding for the downstream task. Experimental results show that our proposed model performs well on six datasets containing homophilous and heterophilous graphs.

graph, graph filter, information, (17 more...)

2401.02682

Country:

North America > United States > Texas (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Shantou (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

Scalable manifold learning by uniform landmark sampling and constrained locally linear embedding

Peng, Dehua, Gui, Zhipeng, Wei, Wenzhang, Wu, Huayi

Abstract: As a pivotal approach in machine learning and data science, manifold learning aims to uncover the intrinsic low-dimensional structure within complex nonlinear manifolds in highdimensional space. By exploiting the manifold hypothesis, various techniques for nonlinear dimension reduction have been developed to facilitate visualization, classification, clustering, and gaining key insights. Although existing manifold learning methods have achieved remarkable successes, they still suffer from extensive distortions incurred in the global structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. Here, we propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the nonlandmarks into the learned space based on the constrained locally linear embedding (CLLE). We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types, and applied it to analyze the single-cell transcriptomics and detect anomalies in electrocardiogram (ECG) signals. The experiments demonstrate notable robustness in embedding quality as the sample rate decreases. Dimension reduction plays an indispensable role in both preprocessing for machine learning tasks and visualization for high-dimensional data [1, 2]. It is often applied to address the curse of dimensionality in data science, which refers to the phenomenon where the amount of data required to achieve a certain level of accuracy increases exponentially as the number of dimensions increases [3]. This makes models difficult to represent the features comprehensively and may lead to an overfitting problem [4].

dataset, landmark, scml, (15 more...)

2401.011

Country:

Asia > China > Hubei Province > Wuhan (0.05)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.75)

Aizpurua, Borja, Orus, Roman

Tensor Networks for Explainable Machine Learning in Cybersecurity

In this paper we show how tensor networks help in developing explainability of machine learning algorithms. Specifically, we develop an unsupervised clustering algorithm based on Matrix Product States (MPS) and apply it in the context of a real use-case of adversary-generated threat intelligence. Our investigation proves that MPS rival traditional deep learning models such as autoencoders and GANs in terms of performance, while providing much richer model interpretability. Our approach naturally facilitates the extraction of feature-wise probabilities, Von Neumann Entropy, and mutual information, offering a compelling narrative for classification of anomalies and fostering an unprecedented level of transparency and interpretability, something fundamental to understand the rationale behind artificial intelligence decisions.

anomaly, information, probability, (16 more...)

2401.00867

Country: Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)

Barcina-Blanco, Marcos, Lobo, Jesus L., Garcia-Bringas, Pablo, Del Ser, Javier

Managing the unknown: a survey on Open Set Recognition and tangential areas

Although this method has demonstrated its efficacy in numerous scenarios and remains relevant, there is an undeniable shift towards emphasizing autonomy and broader applicability in open scenarios. Consequently, there is a fervent quest for the emergence of a new era of Machine Learning (ML) models characterized by enhanced autonomy and generalization to perform a wide variety of tasks. But most formulations of such tasks still assume a so-called closed set scenario: all samples (or instances) at inference time belong to at least one of the classes existing in the training data from which the ML model was learned. Unfortunately, in many real-world circumstances, this closed set assumption may not necessarily hold, giving rise to open set environments where Unknown Classes (UC) can emerge at testing time. When this occurs, the model must detect the emergence of UC; otherwise, ML models designed under the open set assumption will incorrectly classify instances belonging to UC as any of the known classes (KC), often with a high confidence in their predictions. In this context, the Open Set Recognition (OSR) field has emerged [1] to address this problem by endowing ML models with the capacity to detect (and adapt) their knowledge to the appearance of new classes.

classification, international conference, recognition, (13 more...)

2312.08785

Country:

Europe > Spain > Basque Country (0.04)
Europe > Italy > Campania > Naples (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Pang, Qiyuan, Yang, Haizhao

A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral Clustering

We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate. This issue can be lessened by the analytic spectrum estimation of the Laplacian or normalized Laplacian matrices in spectral clustering, making the proposed algorithm very efficient for spectral clustering. Second, to make the proposed algorithm capable of analyzing big data, a distributed and parallel version has been developed with attractive scalability. The speedup by parallel computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the number of processes. {Numerical results will be provided to demonstrate its efficiency in spectral clustering and scalability advantage over existing eigensolvers used for spectral clustering in parallel computing environments.}

algorithm, block chebyshev-davidson method, matrix, (14 more...)

2212.04443

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Architecture > Distributed Systems (0.93)
Information Technology > Software (0.93)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Bermejo, Pablo, Orus, Roman

Variational Quantum and Quantum-Inspired Clustering

Here we present a quantum algorithm for clustering data based on a variational quantum circuit. The algorithm allows to classify data into many clusters, and can easily be implemented in few-qubit Noisy Intermediate-Scale Quantum (NISQ) devices. The idea of the algorithm relies on reducing the clustering problem to an optimization, and then solving it via a Variational Quantum Eigensolver (VQE) combined with non-orthogonal qubit states. In practice, the method uses maximally-orthogonal states of the target Hilbert space instead of the usual computational basis, allowing for a large number of clusters to be considered even with few qubits. We benchmark the algorithm with numerical simulations using real datasets, showing excellent performance even with one single qubit. Moreover, a tensor network simulation of the algorithm implements, by construction, a quantum-inspired clustering algorithm that can run on current classical hardware.

algorithm, quantum computer, qubit, (13 more...)

2206.09893

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)