Clustering
Simultaneous estimation of connectivity and dimensionality in samples of networks
Jiang, Wenlong, McKennan, Chris, Arroyo, Jesรบs, Cape, Joshua
An overarching objective in contemporary statistical network analysis is extracting salient information from datasets consisting of multiple networks. To date, considerable attention has been devoted to node and network clustering, while comparatively less attention has been devoted to downstream connectivity estimation and parsimonious embedding dimension selection. Given a sample of potentially heterogeneous networks, this paper proposes a method to simultaneously estimate a latent matrix of connectivity probabilities and its embedding dimensionality or rank after first pre-estimating the number of communities and the node community memberships. The method is formulated as a convex optimization problem and solved using an alternating direction method of multipliers algorithm. We establish estimation error bounds under the Frobenius norm and nuclear norm for settings in which observable networks have blockmodel structure, even when node memberships are imperfectly recovered. When perfect membership recovery is possible and dimensionality is much smaller than the number of communities, the proposed method outperforms conventional averaging-based methods for estimating connectivity and dimensionality. Numerical studies empirically demonstrate the accuracy of our method across various scenarios. Additionally, analysis of a primate brain dataset demonstrates that posited connectivity is not necessarily full rank in practice, illustrating the need for flexible methodology.
Constrained Centroid Clustering: A Novel Approach for Compact and Structured Partitioning
Veeramachaneni, Sowmini Devi, Garimella, Ramamurthy
This paper presents Constrained Centroid Clustering (CCC), a method that extends classical centroid-based clustering by enforcing a constraint on the maximum distance between the cluster center and the farthest point in the cluster. Using a Lagrangian formulation, we derive a closed-form solution that maintains interpretability while controlling cluster spread. To evaluate CCC, we conduct experiments on synthetic circular data with radial symmetry and uniform angular distribution. Using ring-wise, sector-wise, and joint entropy as evaluation metrics, we show that CCC achieves more compact clusters by reducing radial spread while preserving angular structure, outperforming standard methods such as K-means and GMM. The proposed approach is suitable for applications requiring structured clustering with spread control, including sensor networks, collaborative robotics, and interpretable pattern analysis.
FedUHD: Unsupervised Federated Learning using Hyperdimensional Computing
Lee, You Hak, Yu, Xiaofan, Zhao, Quanling, Ponzina, Flavio, Rosing, Tajana
Unsupervised federated learning (UFL) has gained attention as a privacy-preserving, decentralized machine learning approach that eliminates the need for labor-intensive data labeling. However, UFL faces several challenges in practical applications: (1) non-independent and identically distributed (non-iid) data distribution across devices, (2) expensive computational and communication costs at the edge, and (3) vulnerability to communication noise. Previous UFL approaches have relied on deep neural networks (NN), which introduce substantial overhead in both computation and communication. In this paper, we propose FedUHD, the first UFL framework based on Hyperdimensional Computing (HDC). HDC is a brain-inspired computing scheme with lightweight training and inference operations, much smaller model size, and robustness to communication noise. FedUHD introduces two novel HDC-based designs to improve UFL performance. On the client side, a kNN-based cluster hypervector removal method addresses non-iid data samples by eliminating detrimental outliers. On the server side, a weighted HDC aggregation technique balances the non-iid data distribution across clients. Our experiments demonstrate that FedUHD achieves up to 173.6x and 612.7x better speedup and energy efficiency, respectively, in training, up to 271x lower communication cost, and 15.50% higher accuracy on average across diverse settings, along with superior robustness to various types of noise compared to state-of-the-art NN-based UFL approaches.
Deep Language Geometry: Constructing a Metric Space from LLM Weights
Shamrai, Maksym, Hamolia, Vladyslav
We introduce a novel framework that utilizes the internal weight activations of modern Large Language Models (LLMs) to construct a metric space of languages. Unlike traditional approaches based on hand-crafted linguistic features, our method automatically derives high-dimensional vector representations by computing weight importance scores via an adapted pruning algorithm. Our approach captures intrinsic language characteristics that reflect linguistic phenomena. We validate our approach across diverse datasets and multilingual LLMs, covering 106 languages. The results align well with established linguistic families while also revealing unexpected inter-language connections that may indicate historical contact or language evolution. The source code, computed language latent vectors, and visualization tool are made publicly available at https://github.com/mshamrai/deep-language-geometry.