Wu, Suya
Robust Score-Based Quickest Change Detection
Moushegian, Sean, Wu, Suya, Diao, Enmao, Ding, Jie, Banerjee, Taposh, Tarokh, Vahid
Methods in the field of quickest change detection rapidly detect in real-time a change in the data-generating distribution of an online data stream. Existing methods have been able to detect this change point when the densities of the pre-and post-change distributions are known. Recent work has extended these results to the case where the pre-and post-change distributions are known only by their score functions. This work considers the case where the pre-and post-change score functions are known only to correspond to distributions in two disjoint sets. This work employs a pair of "least-favorable" distributions to robustify the existing score-based quickest change detection algorithm, the properties of which are studied. This paper calculates the leastfavorable distributions for specific model classes and provides methods of estimating the least-favorable distributions for common constructions. Simulation results are provided demonstrating the performance of our robust change detection algorithm. N the fields of sensor networks, cyber-physical systems, biology, and neuroscience, the statistical properties of online data streams can suddenly change in response to some application-specific event ([1]-[4]).
ColA: Collaborative Adaptation with Gradient Learning
Diao, Enmao, Le, Qi, Wu, Suya, Wang, Xinran, Anwar, Ali, Ding, Jie, Tarokh, Vahid
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
Quickest Change Detection for Unnormalized Statistical Models
Wu, Suya, Diao, Enmao, Banerjee, Taposh, Ding, Jie, Tarokh, Vahid
Classical quickest change detection algorithms require modeling pre-change and post-change distributions. Such an approach may not be feasible for various machine learning models because of the complexity of computing the explicit distributions. Additionally, these methods may suffer from a lack of robustness to model mismatch and noise. This paper develops a new variant of the classical Cumulative Sum (CUSUM) algorithm for the quickest change detection. This variant is based on Fisher divergence and the Hyv\"arinen score and is called the Score-based CUSUM (SCUSUM) algorithm. The SCUSUM algorithm allows the applications of change detection for unnormalized statistical models, i.e., models for which the probability density function contains an unknown normalization constant. The asymptotic optimality of the proposed algorithm is investigated by deriving expressions for average detection delay and the mean running time to a false alarm. Numerical results are provided to demonstrate the performance of the proposed algorithm.
Deep Clustering of Compressed Variational Embeddings
Wu, Suya, Diao, Enmao, Ding, Jie, Tarokh, Vahid
ABSTRACT Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint V ariational Autoen-coders with Bernoulli mixture models (V AB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by V ariational Autoencoders (V AEs) and group data representations by Bernoulli mixture models (BMMs). Once jointly trained for compression and clustering, the model can be decomposed into two parts: a data vendor that encodes the raw data into compressed data, and a data consumer that classifies the received (compressed) data. To enable training using the gradient descent algorithm, we propose to use the Gumbel-Softmax distribution to resolve the infeasibility of the back-propagation algorithm when assessing categorical samples. Index T erms -- Clustering, V ariational Autoencoder (V AE), Bernoulli Mixture Model (BMM) 1. INTRODUCTION Clustering is a fundamental task with applications in medical imaging, social network analysis, bioinformatics, computer graphics, etc. Applying classical clustering methods directly to high dimensional data may be computational inefficient and suffer from instability.