AITopics

2401.16348

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Law > Statutes (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Pfeifer, Bastian, Sirocchi, Christel, Bloice, Marcus D., Kreuzthaler, Markus, Urschler, Martin

Federated unsupervised random forest for privacy-preserving patient stratification

arXiv.org Artificial IntelligenceJan-29-2024

In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. This work establishes a powerful framework for advancing precision medicine through unsupervised random-forest-based clustering and federated computing. We introduce a novel multi-omics clustering approach utilizing unsupervised random-forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Moreover, our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark data sets as well as on cancer data from The Cancer Genome Atlas (TCGA). Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing.

affinity matrix, experiment, unsupervised random forest, (14 more...)

2401.16094

Country:

Europe > Italy (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Artificial IntelligenceJan-29-2024

Deep Embedding Clustering Driven by Sample Stability

Cheng, Zhanwen, Li, Feijiang, Wang, Jieting, Qian, Yuhua

Deep clustering methods improve the performance of clustering tasks by jointly optimizing deep representation learning and clustering. While numerous deep clustering algorithms have been proposed, most of them rely on artificially constructed pseudo targets for performing clustering. This construction process requires some prior knowledge, and it is challenging to determine a suitable pseudo target for clustering. To address this issue, we propose a deep embedding clustering algorithm driven by sample stability (DECS), which eliminates the requirement of pseudo targets. Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability. The sample stability aims to explore the deterministic relationship between samples and all cluster centroids, pulling samples to their respective clusters and keeping them away from other clusters with high determinacy. We analyzed the convergence of the loss using Lipschitz continuity in theory, which verifies the validity of the model. The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.

algorithm, sample stability, stability, (13 more...)

2401.15989

Country:

North America > United States > California > Alameda County > Oakland (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJan-28-2024

Rethinking Personalized Federated Learning with Clustering-based Dynamic Graph Propagation

Wang, Jiaqi, Chen, Yuzhong, Wu, Yuhang, Das, Mahashweta, Yang, Hao, Ma, Fenglong

Most existing personalized federated learning approaches are based on intricate designs, which often require complex implementation and tuning. In order to address this limitation, we propose a simple yet effective personalized federated learning framework. Specifically, during each communication round, we group clients into multiple clusters based on their model training status and data distribution on the server side. We then consider each cluster center as a node equipped with model parameters and construct a graph that connects these nodes using weighted edges. Additionally, we update the model parameters at each node by propagating information across the entire graph. Subsequently, we design a precise personalized model distribution strategy to allow clients to obtain the most suitable model from the server side. We conduct experiments on three image benchmark datasets and create synthetic structured datasets with three types of typologies. Experimental results demonstrate the effectiveness of the proposed work.

communication round, federated learning, node, (12 more...)

2401.15874

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Artificial IntelligenceJan-28-2024

One for all: A novel Dual-space Co-training baseline for Large-scale Multi-View Clustering

Kong, Zisen, Fu, Zhiqiang, Chang, Dongxia, Wang, Yiming, Zhao, Yao

In this paper, we propose a novel multi-view clustering model, named Dual-space Co-training Large-scale Multi-view Clustering (DSCMC). The main objective of our approach is to enhance the clustering performance by leveraging co-training in two distinct spaces. In the original space, we learn a projection matrix to obtain latent consistent anchor graphs from different views. This process involves capturing the inherent relationships and structures between data points within each view. Concurrently, we employ a feature transformation matrix to map samples from various views to a shared latent space. This transformation facilitates the alignment of information from multiple views, enabling a comprehensive understanding of the underlying data distribution. We jointly optimize the construction of the latent consistent anchor graph and the feature transformation to generate a discriminative anchor graph. This anchor graph effectively captures the essential characteristics of the multi-view data and serves as a reliable basis for subsequent clustering analysis. Moreover, the element-wise method is proposed to avoid the impact of diverse information between different views. Our algorithm has an approximate linear computational complexity, which guarantees its successful application on large-scale datasets. Through experimental validation, we demonstrate that our method significantly reduces computational complexity while yielding superior clustering performance compared to existing approaches.

algorithm, dataset, graph, (17 more...)

2401.15691

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

arXiv.org Artificial IntelligenceJan-28-2024

Deep Learning for Gamma-Ray Bursts: A data driven event framework for X/Gamma-Ray analysis in space telescopes

Crupi, Riccardo

The HERMES (High Energy Rapid Modular Ensemble of Satellites) Pathfinder mission serves as an in-orbit demonstration of a constellation of nanosatellites whose primary scientific purpose is to discover intense high-energy transients, such as gamma-ray bursts, across a broad energy range (few keV to few MeV) with unparalleled temporal precision and exact localisation. By 2024, the first constellation of six nanosatellites is expected to be launched. To fully exploit satellite data and allow faint astronomical events to emerge, a precise estimation of satellite background count rates is required to determine whether the event is statistically valid or not. The dynamics of the background are related to the satellite's orbital information, which varies in the order of minutes, potentially hiding long transient events. This work introduces two main contributions I have brought ahead; first a novel background estimator is presented that could potentially be fitted to any type of X/Gamma-ray satellite space telescope, capable of capturing long-term dynamics and accurate enough to detect faint transients. This estimator is built using a Neural Network and tested on data from the Fermi Gamma-ray Space Telescope's Gamma Burst Monitor (GBM). As a second objective, it is employed a trigger algorithm, called FOCuS (Functional Online CUSUM), to extract events from the background using the background estimator. The resulting framework, DeepGRB, can identify astronomical events that are both present and absent from the Fermi-GBM catalog. The analysis of the discovered events reveals the strengths and weaknesses of the framework.

neural network architecture, nuclear instrument and method, right ascension and declination, (17 more...)

2401.15632

Country:

Oceania > Australia (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
Energy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

López-Oriona, Ángel, Sun, Ying, Crujeiras, Rosa M.

Fuzzy clustering of circular time series based on a new dependence measure with applications to wind data

Time series clustering is an essential machine learning task with applications in many disciplines. While the majority of the methods focus on time series taking values on the real line, very few works consider time series defined on the unit circle, although the latter objects frequently arise in many applications. In this paper, the problem of clustering circular time series is addressed. To this aim, a distance between circular series is introduced and used to construct a clustering procedure. The metric relies on a new measure of serial dependence considering circular arcs, thus taking advantage of the directional character inherent to the series range. Since the dynamics of the series may vary over the time, we adopt a fuzzy approach, which enables the procedure to locate each series into several clusters with different membership degrees. The resulting clustering algorithm is able to group series generated from similar stochastic processes, reaching accurate results with series coming from a broad variety of models. An extensive simulation study shows that the proposed method outperforms several alternative techniques, besides being computationally efficient. Two interesting applications involving time series of wind direction in Saudi Arabia highlight the potential of the proposed approach.

algorithm, membership degree, time sery, (14 more...)

2402.08687

Country:

Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Taha, Kamal, Shoufan, Abdulhadi

Techniques to Detect Crime Leaders within a Criminal Network: A Survey, Experimental, and Comparative Evaluations

This survey paper offers a thorough analysis of techniques and algorithms used in the identification of crime leaders within criminal networks. For each technique, the paper examines its effectiveness, limitations, potential for improvement, and future prospects. The main challenge faced by existing survey papers focusing on algorithms for identifying crime leaders and predicting crimes is effectively categorizing these algorithms. To address this limitation, this paper proposes a new methodological taxonomy that hierarchically classifies algorithms into more detailed categories and specific techniques. The paper includes empirical and experimental evaluations to rank the different techniques. The combination of the methodological taxonomy, empirical evaluations, and experimental comparisons allows for a nuanced and comprehensive understanding of the techniques and algorithms for identifying crime leaders, assisting researchers in making informed decisions. Moreover, the paper offers valuable insights into the future prospects of techniques for identifying crime leaders, emphasizing potential advancements and opportunities for further research. Here's an overview of our empirical analysis findings and experimental insights, along with the solution we've devised: (1) PageRank and Eigenvector centrality are reliable for mapping network connections, (2) Katz Centrality can effectively identify influential criminals through indirect links, stressing their significance in criminal networks, (3) current models fail to account for the specific impacts of criminal influence levels, the importance of socio-economic context, and the dynamic nature of criminal networks and hierarchies, and (4) we propose enhancements, such as incorporating temporal dynamics and sentiment analysis to reflect the fluidity of criminal activities and relationships, which could improve the detection of key criminals .

algorithm, criminal network, node, (17 more...)

2402.03355

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(19 more...)

Genre: Overview (0.87)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Expert with Clustering: Hierarchical Online Preference Learning Framework

Zhou, Tianyue, Cho, Jung-Hoon, Ardabili, Babak Rahimi, Tabkhi, Hamed, Wu, Cathy

Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users. We introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ options, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. The algorithm performs with low regret, especially when a latent hierarchical structure exists among users. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.

algorithm, centroid, clustering, (16 more...)

2401.15062

Country:

North America > United States > North Carolina (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (0.51)

Industry: Education > Educational Setting (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Graph-based Active Learning for Entity Cluster Repair

Christen, Victor, Obraczka, Daniel, Hofer, Marvin, Franke, Martin, Rahm, Erhard

Cluster repair methods aim to determine errors in clusters and modify them so that each cluster consists of records representing the same entity. Current cluster repair methodologies primarily assume duplicate-free data sources, where each record from one source corresponds to a unique record from another. However, real-world data often deviates from this assumption due to quality issues. Recent approaches apply clustering methods in combination with link categorization methods so they can be applied to data sources with duplicates. Nevertheless, the results do not show a clear picture since the quality highly varies depending on the configuration and dataset. In this study, we introduce a novel approach for cluster repair that utilizes graph metrics derived from the underlying similarity graphs. These metrics are pivotal in constructing a classification model to distinguish between correct and incorrect edges. To address the challenge of limited training data, we integrate an active learning mechanism tailored to cluster-specific attributes. The evaluation shows that the method outperforms existing cluster repair methods without distinguishing between duplicate-free or dirty data sources. Notably, our modified active learning strategy exhibits enhanced performance when dealing with datasets containing duplicates, showcasing its effectiveness in such scenarios.

data source, dataset, similarity graph, (9 more...)

2401.14992

Country:

Europe > Germany > Saxony > Leipzig (0.05)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)