sgem
Scalable physics-informed deep generative model for solving forward and inverse stochastic differential equations
Zhou, Shaoqian, You, Wen, Guo, Ling, Meng, Xuhui
Physics-informed deep learning approaches have been developed to solve forward and inverse stochastic differential equation (SDE) problems with high-dimensional stochastic space. However, the existing deep learning models have difficulties solving SDEs with high-dimensional spatial space. In the present study, we propose a scalable physics-informed deep generative model (sPI-GeM), which is capable of solving SDE problems with both high-dimensional stochastic and spatial space. The sPI-GeM consists of two deep learning models, i.e., (1) physics-informed basis networks (PI-BasisNet), which are used to learn the basis functions as well as the coefficients given data on a certain stochastic process or random field, and (2) physics-informed deep generative model (PI-GeM), which learns the distribution over the coefficients obtained from the PI-BasisNet. The new samples for the learned stochastic process can then be obtained using the inner product between the output of the generator and the basis functions from the trained PI-BasisNet. The sPI-GeM addresses the scalability in the spatial space in a similar way as in the widely used dimensionality reduction technique, i.e., principal component analysis (PCA). A series of numerical experiments, including approximation of Gaussian and non-Gaussian stochastic processes, forward and inverse SDE problems, are performed to demonstrate the accuracy of the proposed model. Furthermore, we also show the scalability of the sPI-GeM in both the stochastic and spatial space using an example of a forward SDE problem with 38- and 20-dimension stochastic and spatial space, respectively.
Personalized Speech Recognition for Children with Test-Time Adaptation
Shi, Zhonghao, Srivastava, Harshvardhan, Shi, Xuan, Narayanan, Shrikanth, Matariฤ, Maja J.
Accurate automatic speech recognition (ASR) for children is crucial for effective real-time child-AI interaction, especially in educational applications. However, off-the-shelf ASR models primarily pre-trained on adult data tend to generalize poorly to children's speech due to the data domain shift from adults to children. Recent studies have found that supervised fine-tuning on children's speech data can help bridge this domain shift, but human annotations may be impractical to obtain for real-world applications and adaptation at training time can overlook additional domain shifts occurring at test time. We devised a novel ASR pipeline to apply unsupervised test-time adaptation (TTA) methods for child speech recognition, so that ASR models pre-trained on adult speech can be continuously adapted to each child speaker at test time without further human annotations. Our results show that ASR models adapted with TTA methods significantly outperform the unadapted off-the-shelf ASR baselines both on average and statistically across individual child speakers. Our analysis also discovered significant data domain shifts both between child speakers and within each child speaker, which further motivates the need for test-time adaptation.
SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Kim, Changhun, Park, Joonhyung, Shim, Hajin, Yang, Eunho
Automatic speech recognition (ASR) models are frequently exposed to data distribution shifts in many real-world scenarios, leading to erroneous predictions. To tackle this issue, an existing test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data. Despite decent performance gain, this work relies solely on naive greedy decoding and performs adaptation across timesteps at a frame level, which may not be optimal given the sequential nature of the model output. Motivated by this, we propose a novel TTA framework, dubbed SGEM, for general ASR models. To treat the sequential output, SGEM first exploits beam search to explore candidate output logits and selects the most plausible one. Then, it utilizes generalized entropy minimization and negative sampling as unsupervised objectives to adapt the model. SGEM achieves state-of-the-art performance for three mainstream ASR models under various domain shifts.
SGEM: stochastic gradient with energy and momentum
In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class of general non-convex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.