Goto

Collaborating Authors

 unmeasured variable


Parallel BiLSTM-Transformer networks for forecasting chaotic dynamics

arXiv.org Artificial Intelligence

The nonlinear nature of chaotic systems results in extreme sensitivity to initial conditions and highly intricate dynamical behaviors, posing fundamental challenges for accurately predicting their evolution. To overcome the limitation that conventional approaches fail to capture both local features and global dependencies in chaotic time series simultaneously, this study proposes a parallel predictive framework integrating Transformer and Bidirectional Long Short-Term Memory (BiLSTM) networks. The hybrid model employs a dual-branch architecture, where the Transformer branch mainly captures long-range dependencies while the BiLSTM branch focuses on extracting local temporal features. The complementary representations from the two branches are fused in a dedicated feature-fusion layer to enhance predictive accuracy. As illustrating examples, the model's performance is systematically evaluated on two representative tasks in the Lorenz system. The first is autonomous evolution prediction, in which the model recursively extrapolates system trajectories from the time-delay embeddings of the state vector to evaluate long-term tracking accuracy and stability. The second is inference of unmeasured variable, where the model reconstructs the unobserved states from the time-delay embeddings of partial observations to assess its state-completion capability. The results consistently indicate that the proposed hybrid framework outperforms both single-branch architectures across tasks, demonstrating its robustness and effectiveness in chaotic system prediction.


Physics-Informed Long Short-Term Memory for Forecasting and Reconstruction of Chaos

arXiv.org Artificial Intelligence

We present the Physics-Informed Long Short-Term Memory (PI-LSTM) network to reconstruct and predict the evolution of unmeasured variables in a chaotic system. The training is constrained by a regularization term, which penalizes solutions that violate the system's governing equations. The network is showcased on the Lorenz-96 model, a prototypical chaotic dynamical system, for a varying number of variables to reconstruct. First, we show the PI-LSTM architecture and explain how to constrain the differential equations, which is a non-trivial task in LSTMs. Second, the PI-LSTM is numerically evaluated in the long-term autonomous evolution to study its ergodic properties. We show that it correctly predicts the statistics of the unmeasured variables, which cannot be achieved without the physical constraint. Third, we compute the Lyapunov exponents of the network to infer the key stability properties of the chaotic system. For reconstruction purposes, adding the physics-informed loss qualitatively enhances the dynamical behaviour of the network, compared to a data-driven only training. This is quantified by the agreement of the Lyapunov exponents. This work opens up new opportunities for state reconstruction and learning of the dynamics of nonlinear systems.


Robustness to Spurious Correlations via Human Annotations

arXiv.org Machine Learning

The reliability of machine learning systems critically assumes that the associations between features and labels remain similar between training and test distributions. However, unmeasured variables, such as confounders, break this assumption---useful correlations between features and labels at training time can become useless or even harmful at test time. For example, high obesity is generally predictive for heart disease, but this relation may not hold for smokers who generally have lower rates of obesity and higher rates of heart disease. We present a framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality. Specifically, we use human annotation to augment each training example with a potential unmeasured variable (i.e. an underweight patient with heart disease may be a smoker), reducing the problem to a covariate shift problem. We then introduce a new distributionally robust optimization objective over unmeasured variables (UV-DRO) to control the worst-case loss over possible test-time shifts. Empirically, we show improvements of 5-10% on a digit recognition task confounded by rotation, and 1.5-5% on the task of analyzing NYPD Police Stops confounded by location.