Goto

Collaborating Authors

 Clustering



Reply to Reviewer # 1

Neural Information Processing Systems

Q1: What other ways to generate fake sequences may be suitable for this problem? A1: That is a good question. GAN to generate some more difficult fake sequences to further improve the ability of the encoder. Q1: Comparison with other state-of-the-art deep clustering methods which are not designed for time-series. A1: Following your suggestion, we compare our method with two state-of-the-art deep clustering methods (i.e., DEC (Xie et al., Table 1: Comparisons on 36 time series datasets (The No. of datasets is consistent with the one in Table 2 in main text)Dataset DEC(RI) IDEC(RI) DTCR(RI) DTCR(NMI) DTCR(ACC) Dataset DEC(RI) IDEC(RI) DTCR(RI) DTCR(NMI) DTCR(ACC)1 0.5817 0.6210 0.6868(0.0026)


Planar Ultrametrics for Image Segmentation

Neural Information Processing Systems

We study the problem of hierarchical clustering on planar graphs. We formulate this in terms of finding the closest ultrametric to a specified set of distances and solve it using an LP relaxation that leverages minimum cost perfect matching as a subroutine to efficiently explore the space of planar partitions. We apply our algorithm to the problem of hierarchical image segmentation.


Deep Learning-Based Approach for Improving Relational Aggregated Search

arXiv.org Artificial Intelligence

Due to an information explosion on the internet, there is a need for the development of aggregated search systems that can boost the retrieval and management of content in various formats. To further improve the clustering of Arabic text data in aggregated search environments, this research investigates the application of advanced natural language processing techniques, namely stacked autoencoders and AraBERT embeddings. By transcending the limitations of traditional search engines, which are imprecise, not contextually relevant, and not personalized, we offer more enriched, context-aware characterizations of search results, so we used a K-means clustering algorithm to discover distinctive features and relationships in these results, we then used our approach on different Arabic queries to evaluate its effectiveness. Our model illustrates that using stacked autoencoders in representation learning suits clustering tasks and can significantly improve clustering search results. It also demonstrates improved accuracy and relevance of search results.


Modeling Market States with Clustering and State Machines

arXiv.org Artificial Intelligence

This work introduces a new framework for modeling financial markets through an interpretable probabilistic state machine. By clustering historical returns based on momentum and risk features across multiple time horizons, we identify distinct market states that capture underlying regimes, such as expansion phase, contraction, crisis, or recovery. From a transition matrix representing the dynamics between these states, we construct a probabilistic state machine that models the temporal evolution of the market. This state machine enables the generation of a custom distribution of returns based on a mixture of Gaussian components weighted by state frequencies. We show that the proposed benchmark significantly outperforms the traditional approach in capturing key statistical properties of asset returns, including skewness and kurtosis, and our experiments across random assets and time periods confirm its robustness.



Matrix Completion with Noisy Side Information

Neural Information Processing Systems

We study the matrix completion problem with side informatio n. Side information has been considered in several matrix completion applicati ons, and has been empirically shown to be useful in many cases. Recently, resear chers studied the effect of side information for matrix completion from a theoretica lv i e w p o i n t,s h o w i n g that sample complexity can be significantly reduced given co mpletely clean features. However, since in reality most given features are noi sy or only weakly informative, the development of a model to handle a general feature set, and investigation of how much noisy features can help matrix recovery, r emains an important issue. In this paper, we propose a novel model that balances b etween features and observations simultaneously in order to leverage feature i nformation yet be robust to feature noise. Moreover, we study the effect of general fe atures in theory and show that by using our model, the sample complexity can be low er than matrix completion as long as features are sufficiently informative .T h i s r e s u l t p r o v i d e s at h e o r e t i c a li n s i g h ti n t ot h eu s e f u l n e s so fg e n e r a ls i d ei n formation. Finally, we consider synthetic data and two applications -- relationshi pp r e d i c t i o na n ds e m i - supervised clustering -- and show that our model outperforms other methods for matrix completion that use features both in theory and pract ice.


Differentially private subspace clustering

Neural Information Processing Systems

Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple "clusters" so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of differential privacy and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests.



Crowdsourcing Without People: Modelling Clustering Algorithms as Experts

arXiv.org Artificial Intelligence

This paper introduces mixsemble, an ensemble method that adapts the Dawid-Skene model to aggregate predictions from multiple model-based clustering algorithms. Unlike traditional crowdsourcing, which relies on human labels, the framework models the outputs of clustering algorithms as noisy annotations. Experiments on both simulated and real-world datasets show that, although the mixsemble is not always the single top performer, it consistently approaches the best result and avoids poor outcomes. This robustness makes it a practical alternative when the true data structure is unknown, especially for non-expert users.