Michailidis, George
Neural Network-Based Change Point Detection for Large-Scale Time-Evolving Data
Geng, Jialiang, Michailidis, George
The paper studies the problem of detecting and locating change points in multivariate time-evolving data. The problem has a long history in statistics and signal processing and various algorithms have been developed primarily for simple parametric models. In this work, we focus on modeling the data through feed-forward neural networks and develop a detection strategy based on the following two-step procedure. In the first step, the neural network is trained over a prespecified window of the data, and its test error function is calibrated over another prespecified window. Then, the test error function is used over a moving window to identify the change point. Once a change point is detected, the procedure involving these two steps is repeated until all change points are identified. The proposed strategy yields consistent estimates for both the number and the locations of the change points under temporal dependence of the data-generating process. The effectiveness of the proposed strategy is illustrated on synthetic data sets that provide insights on how to select in practice tuning parameters of the algorithm and in real data sets. Finally, we note that although the detection strategy is general and can work with different neural network architectures, the theoretical guarantees provided are specific to feed-forward neural architectures.
Deep Learning-based Approaches for State Space Models: A Selective Review
Lin, Jiahe, Michailidis, George
State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the temporal dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs, and presents a unified perspective for discrete time deep state space models and continuous time ones such as latent neural Ordinary Differential and Stochastic Differential Equations. It starts with an overview of the classical maximum likelihood based approach for learning SSMs, reviews variational autoencoder as a general learning pipeline for neural network-based approaches in the presence of latent variables, and discusses in detail representative deep learning models that fall under the SSM framework. Very recent developments, where SSMs are used as standalone architectural modules for improving efficiency in sequence modeling, are also examined. Finally, examples involving mixed frequency and irregularly-spaced time series data are presented to demonstrate the advantage of SSMs in these settings.
A VAE-based Framework for Learning Multi-Level Neural Granger-Causal Connectivity
Lin, Jiahe, Lei, Huitian, Michailidis, George
Granger causality has been widely used in various application domains to capture lead-lag relationships amongst the components of complex dynamical systems, and the focus in extant literature has been on a single dynamical system. In certain applications in macroeconomics and neuroscience, one has access to data from a collection of related such systems, wherein the modeling task of interest is to extract the shared common structure that is embedded across them, as well as to identify the idiosyncrasies within individual ones. This paper introduces a Variational Autoencoder (VAE) based framework that jointly learns Granger-causal relationships amongst components in a collection of related-yet-heterogeneous dynamical systems, and handles the aforementioned task in a principled way. The performance of the proposed framework is evaluated on several synthetic data settings and benchmarked against existing approaches designed for individual system learning. The method is further illustrated on a real dataset involving time series data from a neurophysiological experiment and produces interpretable results.
A Penalty-Based Method for Communication-Efficient Decentralized Bilevel Programming
Nazari, Parvin, Mousavi, Ahmad, Tarzanagh, Davoud Ataee, Michailidis, George
Bilevel programming has recently received attention in the literature, due to its wide range of applications, including reinforcement learning and hyper-parameter optimization. However, it is widely assumed that the underlying bilevel optimization problem is solved either by a single machine or in the case of multiple machines connected in a star-shaped network, i.e., federated learning setting. The latter approach suffers from a high communication cost on the central node (e.g., parameter server) and exhibits privacy vulnerabilities. Hence, it is of interest to develop methods that solve bilevel optimization problems in a communication-efficient decentralized manner. To that end, this paper introduces a penalty function based decentralized algorithm with theoretical guarantees for this class of optimization problems. Specifically, a distributed alternating gradient-type algorithm for solving consensus bilevel programming over a decentralized network is developed. A key feature of the proposed algorithm is to estimate the hyper-gradient of the penalty function via decentralized computation of matrix-vector products and few vector communications, which is then integrated within an alternating algorithm to obtain finite-time convergence analysis under different convexity assumptions. Our theoretical result highlights improvements in the iteration complexity of decentralized bilevel optimization, all while making efficient use of vector communication. Empirical results on both synthetic and real datasets demonstrate that the proposed method performs well in real-world settings.
A Coupled CP Decomposition for Principal Components Analysis of Symmetric Networks
Weylandt, Michael, Michailidis, George
In a number of application domains, one observes a sequence of network data; for example, repeated measurements between users interactions in social media platforms, financial correlation networks over time, or across subjects, as in multi-subject studies of brain connectivity. One way to analyze such data is by stacking networks into a third-order array or tensor. We propose a principal components analysis (PCA) framework for sequence network data, based on a novel decomposition for semi-symmetric tensors. We derive efficient algorithms for computing our proposed "Coupled CP" decomposition and establish estimation consistency of our approach under an analogue of the spiked covariance model with rates the same as the matrix case up to a logarithmic term. Our framework inherits many of the strengths of classical PCA and is suitable for a wide range of unsupervised learning tasks, including identifying principal networks, isolating meaningful changepoints or outliers across observations, and for characterizing the "variability network" of the most varying edges. Finally, we demonstrate the effectiveness of our proposal on simulated data and on examples from political science and financial economics. The proof techniques used to establish our main consistency results are surprisingly straight-forward and may find use in a variety of other matrix and tensor decomposition problems.
Joint Learning of Linear Time-Invariant Dynamical Systems
Modi, Aditya, Faradonbeh, Mohamad Kazem Shirani, Tewari, Ambuj, Michailidis, George
The problem of identifying the transition matrices in linear time-invariant dynamical systems (LTIDS) has been studied extensively in the literature (Lai and Wei, 1983; Kailath et al., 2000; Buchmann and Chan, 2007). Recent works establish finite-time rates for accurately learning the dynamics in different online and offline settings (Faradonbeh et al., 2018; Simchowitz et al., 2018; Sarkar and Rakhlin, 2019). The existing results are established assuming that the goal is to identify the transition matrix of a single dynamical system. However, in many areas where LTIDS models (as in (1) below) are used, such as macroeconomics (Stock and Watson, 2016), functional genomics (Fujita et al., 2007), and neuroimaging (Seth et al., 2015), one observes multiple dynamical systems and needs to estimate the transition matrices for all of them jointly. Further, the underlying dynamical systems share commonalities, but also exhibit heterogeneity. For example, (Skripnikov and Michailidis, 2019a) analyze economic indicators of US states whose local economies share a strong manufacturing base. Moreover, in time course genetics experiments, one is interested in understanding the dynamics and drivers of gene expressions across related animal or cell line populations (Basu et al., 2015), while in neuroimaging, one has access to data from multiple subjects that suffer from the same disease (Skripnikov and Michailidis, 2019b). In all these settings, there are remarkable similarities in the dynamics of the systems, but some degree of heterogeneity is also present. Hence, it becomes natural to pursue a joint learning strategy of the systems'
Inference for Change Points in High Dimensional Mean Shift Models
Kaul, Abhishek, Michailidis, George
Detection of change points constitutes a canonical statistical problem due to numerous applications in diverse areas, including economics and finance (Basseville et al. [1993], Frisรฉn [2008]), quality process control (Qiu [2013]), functional genomics and neuroscience (Koepcke et al. [2016]). The offline version of the problem, wherein one examines the data retrospectively and aims to detect the presence and/or location of change points has been studied extensively for a variety of statistical models, including signal plus noise, regression, graphical, random graph, factor and time series models and various algorithms have been developed to accomplish this task -dynamic programming, regularized cost functions, binary segmentation, multiscale methods, etc., see, e.g. the review article Niu et al. [2016]. In the presence of multiple change points, consistency of the estimated location of the change points under certain regularity assumptions on the temporal spacing between change points and on the magnitude of the changes in the underlying model parameters have been established, see, e.g. Fryzlewicz [2014], Frick et al. [2014] and Wang and Samworth [2018] amongst several others, here the former two are under a fixed p framework and the latter under a high dimensional framework. Further, when a single change point has been assumed, the asymptotic distribution of the change point estimator has been established for various statistical models, see, e.g., [Bai, 1994, 1997], Csorgo and Horvรกth [1997], under fixed p setting, and [Bhattacharjee et al., 2017, 2019], [Kaul et al., 2020, 2021], under diverging dimensionality, where the last two articles allow potential high dimensionality.
A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems
Barazandeh, Babak, Huang, Tianjian, Michailidis, George
Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks~(GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in the case of multiple machines connected in centralized fashion, wherein each one communicates with a central node. The latter approach becomes challenging, when the underlying communications network has low bandwidth. In addition, privacy considerations may dictate that certain nodes can communicate with a subset of other nodes. Hence, it is of interest to develop methods that solve min-max games in a decentralized manner. To that end, we develop a decentralized adaptive momentum (ADAM)-type algorithm for solving min-max optimization problem under the condition that the objective function satisfies a Minty Variational Inequality condition, which is a generalization to convex-concave case. The proposed method overcomes shortcomings of recent non-adaptive gradient-based decentralized algorithms for min-max optimization problems that do not perform well in practice and require careful tuning. In this paper, we obtain non-asymptotic rates of convergence of the proposed algorithm (coined DADAM$^3$) for finding a (stochastic) first-order Nash equilibrium point and subsequently evaluate its performance on training GANs. The extensive empirical evaluation shows that DADAM$^3$ outperforms recently developed methods, including decentralized optimistic stochastic gradient for solving such min-max problems.
Solving a class of non-convex min-max games using adaptive momentum methods
Barazandeh, Babak, Tarzanagh, Davoud Ataee, Michailidis, George
Adaptive momentum methods have recently attracted a lot of attention for training of deep neural networks. They use an exponential moving average of past gradients of the objective function to update both search directions and learning rates. However, these methods are not suited for solving min-max optimization problems that arise in training generative adversarial networks. In this paper, we propose an adaptive momentum min-max algorithm that generalizes adaptive momentum methods to the non-convex min-max regime. Further, we establish non-asymptotic rates of convergence for the proposed algorithm when used in a reasonably broad class of non-convex min-max optimization problems. Experimental results illustrate its superior performance vis-a-vis benchmark methods for solving such problems.
Sparse Partial Least Squares for Coarse Noisy Graph Alignment
Weylandt, Michael, Michailidis, George, Roddenberry, T. Mitchell
Graph signal processing (GSP) provides a powerful framework for analyzing signals arising in a variety of domains. In many applications of GSP, multiple network structures are available, each of which captures different aspects of the same underlying phenomenon. To integrate these different data sources, graph alignment techniques attempt to find the best correspondence between vertices of two graphs. We consider a generalization of this problem, where there is no natural one-to-one mapping between vertices, but where there is correspondence between the community structures of each graph. Because we seek to learn structure at this higher community level, we refer to this problem as "coarse" graph alignment. To this end, we propose a novel regularized partial least squares method which both incorporates the observed graph structures and imposes sparsity in order to reflect the underlying block community structure. We provide efficient algorithms for our method and demonstrate its effectiveness in simulations.