Goto

Collaborating Authors

 zt 1


End-to-End Identifiable and Consistent Recurrent Switching Dynamical Systems

arXiv.org Machine Learning

Learning identifiable representations in deep generative models remains a fundamental challenge, particularly for sequential data with regime-switching dynamics. Existing approaches establish identifiability under restrictive assumptions, such as stationarity or limited emission models, and typically rely on variational autoencoder (VAE) estimators, which introduce approximation gaps that limit the recovery of the latent structure. In this work, we address both the theoretical and practical limitations of this setting. First, we establish identifiability of a broad class of recurrent nonlinear switching dynamical systems under flexible assumptions, significantly extending prior results. Second, we introduce $Ω$SDS, a flow-based estimator that enables exact likelihood optimization using expectation-maximisation. Through empirical validation on both synthetic and real-world data, our results demonstrate that $Ω$SDS achieves improved disentanglement compared to VAE-based estimators and more accurate forecasting of underlying dynamics.


Information Theory and Statistical Learning

arXiv.org Machine Learning

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), $f$\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.


Sequential Neural Models with Stochastic Layers

Neural Information Processing Systems

This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.


World ModelHumanObjectInteractionVideosReal-worldDrivingVideosHumanMotionVideosIn-the-wildVideoDataPre-trainingVisualControlTasks Fine-tuningRobotic ManipulationRobotic LocomotionAutonomousDriving

Neural Information Processing Systems

Unsupervised pre-training methods utilizing large and diverse datasets have achieved tremendous success across a range of domains. Recent work has investigated such unsupervised pre-training methods for model-based reinforcement learning (MBRL) but is limited to domain-specific or simulated data. In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of downstream visual control tasks. However, inthe-wild videos are complicated with various contextual factors, such as intricate backgrounds and textured appearance, which precludes a world model from extracting shared world knowledge to generalize better. To tackle this issue, we introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling to overcome the complexity and diversity of in-the-wild videos and facilitate knowledge transfer between distinct scenes. Specifically, a contextualized extension of the latent dynamics model is elaborately realized by incorporating a context encoder to retain contextual information and empower the image decoder, which encourages the latent dynamics model to concentrate on essential temporal variations. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of MBRL in various domains, including robotic manipulation, locomotion, and autonomous driving.


54b2b21af94108d83c2a909d5b0a6a50-Supplemental.pdf

Neural Information Processing Systems

Predicted sequences of a moving pendulum conditioned on z1 q(z1|x1:5,u1:4) or, in case of the EKVAE, on z1 p(z1|a1:5,u1:4), where the auxiliary variables are obtained through a1:5 q(a1:5|x1:5). The average prediction accuracy, measured by the MSE, can be found in Tab.


Temporally Disentangled Representation Learning under Unknown Nonstationarity

Neural Information Processing Systems

In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed auxiliary variables (e.g., class labels and/or domain indexes) as side-information or assuming simplified latent causal dynamics. Both constrain the method to a limited range of scenarios. In this study, we further explored the Markov Assumption under time-delayed causally related process in nonstationary setting and showed that under mild conditions, the independent latent components can be recovered from their nonlinear mixture up to a permutation and a component-wise transformation, without the observation of auxiliary variables. We then introduce NCTRL, a principled estimation framework, to reconstruct time-delayed latent causal variables and identify their relations from measured sequential data only. Empirical evaluations demonstrated the reliable identification of time-delayed latent causal influences, with our methodology substantially outperforming existing baselines that fail to exploit the nonstationarity adequately and then, consequently, cannot distinguish distribution shifts.


Practical Efficient Global Optimization is No-regret

arXiv.org Machine Learning

Efficient global optimization (EGO) is one of the most widely used noise-free Bayesian optimization algorithms.It comprises the Gaussian process (GP) surrogate model and expected improvement (EI) acquisition function. In practice, when EGO is applied, a scalar matrix of a small positive value (also called a nugget or jitter) is usually added to the covariance matrix of the deterministic GP to improve numerical stability. We refer to this EGO with a positive nugget as the practical EGO. Despite its wide adoption and empirical success, to date, cumulative regret bounds for practical EGO have yet to be established. In this paper, we present for the first time the cumulative regret upper bound of practical EGO. In particular, we show that practical EGO has sublinear cumulative regret bounds and thus is a no-regret algorithm for commonly used kernels including the squared exponential (SE) and Matérn kernels ($ν>\frac{1}{2}$). Moreover, we analyze the effect of the nugget on the regret bound and discuss the theoretical implication on its choice. Numerical experiments are conducted to support and validate our findings.



DeepExplicitDurationSwitchingModels forTimeSeries

Neural Information Processing Systems

Time series forecasting plays akeyrole in informing industrial and business decisions [17,24,8], while segmentation isuseful forunderstanding biological andphysicalsystems [40,45,34].