conjugacy
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area (0.46)
- Education (0.46)
Identifying Equivalent Training Dynamics
Study of the nonlinear evolution deep neural network (DNN) parameters undergo during training has uncovered regimes of distinct dynamical behavior. While a detailed understanding of these phenomena has the potential to advance improvements in training efficiency and robustness, the lack of methods for identifying when DNN models have equivalent dynamics limits the insight that can be gained from prior work. Topological conjugacy, a notion from dynamical systems theory, provides a precise definition of dynamical equivalence, offering a possible route to address this need. However, topological conjugacies have historically been challenging to compute. By leveraging advances in Koopman operator theory, we develop a framework for identifying conjugate and non-conjugate training dynamics. To validate our approach, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent. We then utilize our approach to: (a) identify non-conjugate training dynamics between shallow and wide fully connected neural networks; (b) characterize the early phase of training dynamics in convolutional neural networks; (c) uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking. Our results, across a range of DNN architectures, illustrate the flexibility of our framework and highlight its potential for shedding new light on training dynamics.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area (0.46)
- Education (0.46)
Identifying Equivalent Training Dynamics
Study of the nonlinear evolution deep neural network (DNN) parameters undergo during training has uncovered regimes of distinct dynamical behavior. While a detailed understanding of these phenomena has the potential to advance improvements in training efficiency and robustness, the lack of methods for identifying when DNN models have equivalent dynamics limits the insight that can be gained from prior work. Topological conjugacy, a notion from dynamical systems theory, provides a precise definition of dynamical equivalence, offering a possible route to address this need. However, topological conjugacies have historically been challenging to compute. By leveraging advances in Koopman operator theory, we develop a framework for identifying conjugate and non-conjugate training dynamics. To validate our approach, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent.
Hierarchical-Graph-Structured Edge Partition Models for Learning Evolving Community Structure
We propose a novel dynamic network model to capture evolving latent communities within temporal networks. To achieve this, we decompose each observed dynamic edge between vertices using a Poisson-gamma edge partition model, assigning each vertex to one or more latent communities through \emph{nonnegative} vertex-community memberships. Specifically, hierarchical transition kernels are employed to model the interactions between these latent communities in the observed temporal network. A hierarchical graph prior is placed on the transition structure of the latent communities, allowing us to model how they evolve and interact over time. Consequently, our dynamic network enables the inferred community structure to merge, split, and interact with one another, providing a comprehensive understanding of complex network dynamics. Experiments on various real-world network datasets demonstrate that the proposed model not only effectively uncovers interpretable latent structures but also surpasses other state-of-the art dynamic network models in the tasks of link prediction and community detection.
- Asia > China > Guangdong Province (0.04)
- North America > United States > New York > New York County > New York City (0.04)
eddb904a6db773755d2857aacadb1cb0-Reviews.html
Reviewer's response to the rebuttal "line 308: [10, 400] range for alpha We chose 400 because higher values of alpha caused computational difficulties in the Gibbs sampling. This upper bound is a bit arbitrary; however, we found the exact upper limit above about 100 had little effect on the estimates of transition probabilities. Similarly, adjusting the lower bound below 60 or so had little effect. While we do hope for a more well justified hyperprior for alpha in future work, we believe our choice did not overly influence the results." Please add this information to the paper.
Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis
Ostrow, Mitchell, Eisen, Adam, Kozachkov, Leo, Fiete, Ila
How can we tell whether two neural networks utilize the same internal processes for a particular computation? This question is pertinent for multiple subfields of neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics, called Dynamical Similarity Analysis (DSA). Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare different systems passed through this embedding using a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. In four case studies, we demonstrate that our method disentangles conjugate and non-conjugate recurrent neural networks (RNNs), while geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method opens the door to comparative analyses of the essential temporal structure of computation in neural circuits.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New York (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Automatically Marginalized MCMC in Probabilistic Programming
Lai, Jinlin, Burroni, Javier, Guan, Hui, Sheldon, Daniel
Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models. The advent of probabilistic programming languages (PPLs) frees users from writing inference algorithms and lets users focus on modeling. However, many models are difficult for HMC to solve directly, and often require tricks like model reparameterization. We are motivated by the fact that many of those models could be simplified by marginalization. We propose to use automatic marginalization as part of the sampling process using HMC in a graphical model extracted from a PPL, which substantially improves sampling from real-world hierarchical models.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Universal set of Observables for Forecasting Physical Systems through Causal Embedding
Manjunath, G, de Clercq, A, Steynberg, MJ
We demonstrate when and how an entire left-infinite orbit of an underlying dynamical system or observations from such left-infinite orbits can be uniquely represented by a pair of elements in a different space, a phenomenon which we call \textit{causal embedding}. The collection of such pairs is derived from a driven dynamical system and is used to learn a function which together with the driven system would: (i). determine a system that is topologically conjugate to the underlying system (ii). enable forecasting the underlying system's dynamics since the conjugacy is computable and universal, i.e., it does not depend on the underlying system (iii). guarantee an attractor containing the image of the causally embedded object even if there is an error made in learning the function. By accomplishing these we herald a new forecasting scheme that beats the existing reservoir computing schemes that often lead to poor long-term consistency as there is no guarantee of the existence of a learnable function, and overcomes the challenges of stability in Takens delay embedding. We illustrate accurate modeling of underlying systems where previously known techniques have failed.
- Africa > South Africa > Gauteng > Pretoria (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
On Equivalent Optimization of Machine Learning Methods
Redman, William T., Bello-Rivas, Juan M., Fonoberova, Maria, Mohr, Ryan, Kevrekidis, Ioannis G., Mezić, Igor
At the core of many machine learning methods resides an iterative optimization algorithm for their training. Such optimization algorithms often come with a plethora of choices regarding their implementation. In the case of deep neural networks, choices of optimizer, learning rate, batch size, etc. must be made. Despite the fundamental way in which these choices impact the training of deep neural networks, there exists no general method for identifying when they lead to equivalent, or non-equivalent, optimization trajectories. By viewing iterative optimization as a discrete-time dynamical system, we are able to leverage Koopman operator theory, where it is known that conjugate dynamics can have identical spectral objects. We find highly overlapping Koopman spectra associated with the application of online mirror and gradient descent to specific problems, illustrating that such a data-driven approach can corroborate the recently discovered analytical equivalence between the two optimizers. We extend our analysis to feedforward, fully connected neural networks, providing the first general characterization of when choices of learning rate, batch size, layer width, data set, and activation function lead to equivalent, and non-equivalent, evolution of network parameters during training. Among our main results, we find that learning rate to batch size ratio, layer width, nature of data set (handwritten vs. synthetic), and activation function affect the nature of conjugacy. Our data-driven approach is general and can be utilized broadly to compare the optimization of machine learning methods.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)