neumann series
Appendix of Joint Data-T ask Generation for Auxiliary Learning Hong Chen
We provide the derivation of the upper implicit gradient in eq. We summarize the whole DTG-AuxL algorithm in Algorithm 1, where the lower and upper optimization updates are conducted alternatingly. We use the batch stochastic gradient optimization for both the lower and upper update. STL: It is a natural baseline where we only train on the primary task. Equal: It is a multi-task learning method, where we assign an equal weight of 1.0 to the loss of each MAXL can be only applied to the classification problem.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (4 more...)
Appendix of Joint Data-T ask Generation for Auxiliary Learning Hong Chen
We provide the derivation of the upper implicit gradient in eq. We summarize the whole DTG-AuxL algorithm in Algorithm 1, where the lower and upper optimization updates are conducted alternatingly. We use the batch stochastic gradient optimization for both the lower and upper update. STL: It is a natural baseline where we only train on the primary task. Equal: It is a multi-task learning method, where we assign an equal weight of 1.0 to the loss of each MAXL can be only applied to the classification problem.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (4 more...)
Conditional Independence Estimates for the Generalized Nonparanormal
Shah, Ujas, Lladser, Manuel, Morrison, Rebecca
For general non-Gaussian distributions, the covariance and precision matrices do not encode the independence structure of the variables, as they do for the multivariate Gaussian. This paper builds on previous work to show that for a class of non-Gaussian distributions -- those derived from diagonal transformations of a Gaussian -- information about the conditional independence structure can still be inferred from the precision matrix, provided the data meet certain criteria, analogous to the Gaussian case. We call such transformations of the Gaussian as the generalized nonparanormal. The functions that define these transformations are, in a broad sense, arbitrary. We also provide a simple and computationally efficient algorithm that leverages this theory to recover conditional independence structure from the generalized nonparanormal data. The effectiveness of the proposed algorithm is demonstrated via synthetic experiments and applications to real-world data.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- Europe > United Kingdom > England > Greater Manchester > Rochdale (0.04)
- Europe > Ireland (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Neumann Series-based Neural Operator for Solving Inverse Medium Problem
Liu, Ziyang, Chen, Fukai, Chen, Junqing, Qiu, Lingyun, Shi, Zuoqiang
The inverse medium problem, inherently ill-posed and nonlinear, presents significant computational challenges. This study introduces a novel approach by integrating a Neumann series structure within a neural network framework to effectively handle multiparameter inputs. Experiments demonstrate that our methodology not only accelerates computations but also significantly enhances generalization performance, even with varying scattering properties and noisy data. The robustness and adaptability of our framework provide crucial insights and methodologies, extending its applicability to a broad spectrum of scattering problems. These advancements mark a significant step forward in the field, offering a scalable solution to traditionally complex inverse problems.
On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization
Meta-learning that uses implicit gradient have provided an exciting alternative to standard techniques which depend on the trajectory of the inner loop training. Implicit meta-learning (IML), however, require computing $2^{nd}$ order gradients, particularly the Hessian which is impractical to compute for modern deep learning models. Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked. In this study, we start by conducting a systematic comparative analysis of the various approximation methods and their effect when incorporated into IML training routines. We establish situations where catastrophic forgetting is exhibited in IML and explain their cause in terms of the inability of the approximations to estimate the curvature at convergence points. Sources of IML training instability are demonstrated and remedied. A detailed analysis of the effeciency of various inverse Hessian-vector product approximation methods is also provided. Subsequently, we use the insights gained to propose and evaluate a novel semi-supervised learning algorithm that learns to inductively weigh consistency regularization losses. We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples. Results outperform the baseline FixMatch performance.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
Reviving and Improving Recurrent Back-Propagation
Liao, Renjie, Xiong, Yuwen, Fetaya, Ethan, Zhang, Lisa, Yoon, KiJung, Pitkow, Xaq, Urtasun, Raquel, Zemel, Richard
In this paper, we revisit the recurrent backpropagation (RBP) algorithm (Almeida, 1987; Pineda, 1987), discuss the conditions under which it applies as well as how to satisfy them in deep neural networks. We show that RBP can be unstable and propose two variants based on conjugate gradient on the normal equations (CG-RBP) and Neumann series (Neumann-RBP). We further investigate the relationship between Neumann-RBP and back propagation through time (BPTT) and its truncated version (TBPTT). Our Neumann-RBP has the same time complexity as TBPTT but only requires constant memory, whereas TBPTT's memory cost scales linearly with the number of truncation steps. We examine all RBP variants along with BPTT and TBPTT in three different application domains: associative memory with continuous Hopfield networks, document classification in citation networks using graph neural networks and hyperparameter optimization for fully connected networks. All experiments demonstrate that RBPs, especially the Neumann-RBP variant, are efficient and effective for optimizing convergent recurrent neural networks.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York (0.04)
- North America > United States > District of Columbia > Washington (0.04)