Clustering
Bayesian Nonparametric Dynamical Clustering of Time Series
Pérez-Herrero, Adrián, Félix, Paulo, Presedo, Jesús, Ek, Carl Henrik
Abstract--We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics. We develop a Bayesian non-parametric approach using a hierarchical Dirichlet process as a prior on the parameters of a Switching Linear Dynamical System and a Gaussian process prior to model the statistical variations in amplitude and temporal alignment within each cluster . By modeling the evolution of time series patterns, the method avoids unnecessary proliferation of clusters in a principled manner . We perform inference by formulating a variational lower bound for off-line and on-line scenarios, enabling efficient learning through optimization. We illustrate the versatility and effectiveness of the approach through several case studies of electrocardiogram analysis using publicly available databases. Index T erms--Time series analysis, Bayesian methods, Gaussian processes, linear dynamical systems, Dirichlet processes, unsupervised learning, electrocardiogram, arrhythmia detection. IME series data analysis has come to pervade all scientific and technological domains, driven by the need to understand change over time. With the growing availability of such data, machine learning has assumed an increasingly central role in a wide variety of tasks which fall under the category of pattern recognition. Particularly, there is growing interest in identifying similar behaviors in time series data as a preliminary step towards generating insights into the dynamics of the underlying processes. Some recent methodologies can be found for characterizing sea wave conditions [1], transcriptome-wide gene expression profiling [2], selecting stocks with different share price performance [3], and discovering human motion primitives [4].
Angular Constraint Embedding via SpherePair Loss for Constrained Clustering
Constrained clustering integrates domain knowledge through pairwise constraints. However, existing deep constrained clustering (DCC) methods are either limited by anchors inherent in end-to-end modeling or struggle with learning discriminative Euclidean embedding, restricting their scalability and real-world applicability. To avoid their respective pitfalls, we propose a novel angular constraint embedding approach for DCC, termed SpherePair. Using the SpherePair loss with a geometric formulation, our method faithfully encodes pairwise constraints and leads to embeddings that are clustering-friendly in angular space, effectively separating representation learning from clustering. SpherePair preserves pairwise relations without conflict, removes the need to specify the exact number of clusters, generalizes to unseen data, enables rapid inference of the number of clusters, and is supported by rigorous theoretical guarantees. Comparative evaluations with state-of-the-art DCC methods on diverse benchmarks, along with empirical validation of theoretical insights, confirm its superior performance, scalability, and overall real-world effectiveness. Code is available at \href{https://github.com/spherepaircc/SpherePairCC/tree/main}{our repository}.
Chem-NMF: Multi-layer $α$-divergence Non-Negative Matrix Factorization for Cardiorespiratory Disease Clustering, with Improved Convergence Inspired by Chemical Catalysts and Rigorous Asymptotic Analysis
Torabi, Yasaman, Shirani, Shahram, Reilly, James P.
Non-Negative Matrix Factorization (NMF) is an unsupervised learning method offering low-rank representations across various domains such as audio processing, biomedical signal analysis, and image recognition. The incorporation of $α$-divergence in NMF formulations enhances flexibility in optimization, yet extending these methods to multi-layer architectures presents challenges in ensuring convergence. To address this, we introduce a novel approach inspired by the Boltzmann probability of the energy barriers in chemical reactions to theoretically perform convergence analysis. We introduce a novel method, called Chem-NMF, with a bounding factor which stabilizes convergence. To our knowledge, this is the first study to apply a physical chemistry perspective to rigorously analyze the convergence behaviour of the NMF algorithm. We start from mathematically proven asymptotic convergence results and then show how they apply to real data. Experimental results demonstrate that the proposed algorithm improves clustering accuracy by 5.6% $\pm$ 2.7% on biomedical signals and 11.1% $\pm$ 7.2% on face images (mean $\pm$ std).
Autonomy-Aware Clustering: When Local Decisions Supersede Global Prescriptions
Srivastava, Amber, Basiri, Salar, Salapaka, Srinivasa
Clustering arises in a wide range of problem formulations, yet most existing approaches assume that the entities under clustering are passive and strictly conform to their assigned groups. In reality, entities often exhibit local autonomy, overriding prescribed associations in ways not fully captured by feature representations. Such autonomy can substantially reshape clustering outcomes -- altering cluster compositions, geometry, and cardinality -- with significant downstream effects on inference and decision-making. We introduce autonomy-aware clustering, a reinforcement learning (RL) framework that learns and accounts for the influence of local autonomy without requiring prior knowledge of its form. Our approach integrates RL with a Deterministic Annealing (DA) procedure, where, to determine underlying clusters, DA naturally promotes exploration in early stages of annealing and transitions to exploitation later. We also show that the annealing procedure exhibits phase transitions that enable design of efficient annealing schedules. To further enhance adaptability, we propose the Adaptive Distance Estimation Network (ADEN), a transformer-based attention model that learns dependencies between entities and cluster representatives within the RL loop, accommodates variable-sized inputs and outputs, and enables knowledge transfer across diverse problem instances. Empirical results show that our framework closely aligns with underlying data dynamics: even without explicit autonomy models, it achieves solutions close to the ground truth (gap ~3-4%), whereas ignoring autonomy leads to substantially larger gaps (~35-40%). The code and data are publicly available at https://github.com/salar96/AutonomyAwareClustering.
Supplementary Material for Semantic Image Synthesis with Unconditional Generator
This process enables the value (feature maps) to be rearranged (through a weighted sum) to align with the form of the query, thereby reflecting their strong correspondence. The input noise is removed because its stochasticity slows down the training. Given the need for balancing between high correspondence and image quality, we empirically set the weights of our loss terms. To demonstrate the influence of the additional losses introduced in our method, we provide both quantitative and qualitative ablations in Figure S2 and S3, respectively. Nonetheless, caution is warranted when overly increasing the number of clusters.