general solution
- Europe > United Kingdom > England (0.28)
- Asia > China (0.14)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
ContinuAR: Continuous Autoregression For Infinite-Fidelity Fusion Wei W. Xing
Multi-fidelity fusion has become an important surrogate technique, which provides insights into expensive computer simulations and effectively improves decision-making, e.g., optimization, with less computational cost. Multi-fidelity fusion is much more computationally efficient compared to traditional single-fidelity surrogates. Despite the fast advancement of multi-fidelity fusion techniques, they lack a systematic framework to make use of the fidelity indicator, deal with high-dimensional and arbitrary data structure, and scale well to infinite-fidelity problems. In this work, we first generalize the popular autoregression (AR) to derive a novel linear fidelity differential equation (FiDE), paving the way to tractable infinite-fidelity fusion. We generalize FiDE to a high-dimensional system, which also provides a unifying framework to seemly bridge the gap between many multi-and single-fidelity GP-based models. We then propose ContinuAR, a rank-1 approximation solution to FiDEs, which is tractable to train, compatible with arbitrary multi-fidelity data structure, linearly scalable to the output dimension, and most importantly, delivers consistent SOT A performance with a significant margin over the baseline methods. Compared to the SOT A infinite-fidelity fusion, IFC, ContinuAR achieves up to 4x improvement in accuracy and 62,500x speedup in training time.
- Asia (0.46)
- Europe > United Kingdom > England (0.28)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Illinois (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Beijing > Beijing (0.04)
Discovering Interpretable Ordinary Differential Equations from Noisy Data
Golder, Rahul, Hasan, M. M. Faruque
The data-driven discovery of interpretable models approximating the underlying dynamics of a physical system has gained attraction in the past decade. Current approaches employ pre-specified functional forms or basis functions and often result in models that lack physical meaning and interpretability, let alone represent the true physics of the system. We propose an unsupervised parameter estimation methodology that first finds an approximate general solution, followed by a spline transformation to linearly estimate the coefficients of the governing ordinary differential equation (ODE). The approximate general solution is postulated using the same functional form as the analytical solution of a general homogeneous, linear, constant-coefficient ODE. An added advantage is its ability to produce a high-fidelity, smooth functional form even in the presence of noisy data. The spline approximation obtains gradient information from the functional form which are linearly independent and creates the basis of the gradient matrix. This gradient matrix is used in a linear system to find the coefficients of the ODEs. From the case studies, we observed that our modeling approach discovers ODEs with high accuracy and also promotes sparsity in the solution without using any regularization techniques. The methodology is also robust to noisy data and thus allows the integration of data-driven techniques into real experimental setting for data-driven learning of physical phenomena.
Unraveling the Black-box Magic: An Analysis of Neural Networks' Dynamic Local Extrema
We point out that neural networks are not black boxes, and their generalization stems from the ability to dynamically map a dataset to the local extrema of the model function. We further prove that the number of local extrema in a neural network is positively correlated with the number of its parameters, and on this basis, we give a new algorithm that is different from the back-propagation algorithm, which we call the extremum-increment algorithm. Some difficult situations, such as gradient vanishing and overfitting, can be reasonably explained and dealt with in this framework.
Relationship between H\"{o}lder Divergence and Functional Density Power Divergence: Intersection and Generalization
In this study, we discuss the relationship between two families of density-power-based divergences with functional degrees of freedom -- the H\"{o}lder divergence and the functional density power divergence (FDPD) -- based on their intersection and generalization. These divergence families include the density power divergence and the $\gamma$-divergence as special cases. First, we prove that the intersection of the H\"{o}lder divergence and the FDPD is limited to a general divergence family introduced by Jones et al. (Biometrika, 2001). Subsequently, motivated by the fact that H\"{o}lder's inequality is used in the proofs of nonnegativity for both the H\"{o}lder divergence and the FDPD, we define a generalized divergence family, referred to as the $\xi$-H\"{o}lder divergence. The nonnegativity of the $\xi$-H\"{o}lder divergence is established through a combination of the inequalities used to prove the nonnegativity of the H\"{o}lder divergence and the FDPD. Furthermore, we derive an inequality between the composite scoring rules corresponding to different FDPDs based on the $\xi$-H\"{o}lder divergence. Finally, we prove that imposing the mathematical structure of the H\"{o}lder score on a composite scoring rule results in the $\xi$-H\"{o}lder divergence.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
Is Bellman Equation Enough for Learning Control?
You, Haoxiang, Molu, Lekan, Abraham, Ian
The Bellman equation and its continuous-time counterpart, the Hamilton-Jacobi-Bellman (HJB) equation, serve as necessary conditions for optimality in reinforcement learning and optimal control. While the value function is known to be the unique solution to the Bellman equation in tabular settings, we demonstrate that this uniqueness fails to hold in continuous state spaces. Specifically, for linear dynamical systems, we prove the Bellman equation admits at least $\binom{2n}{n}$ solutions, where $n$ is the state dimension. Crucially, only one of these solutions yields both an optimal policy and a stable closed-loop system. We then demonstrate a common failure mode in value-based methods: convergence to unstable solutions due to the exponential imbalance between admissible and inadmissible solutions. Finally, we introduce a positive-definite neural architecture that guarantees convergence to the stable solution by construction to address this issue.
- North America > United States (0.28)
- Europe > France (0.14)
d6ef5f7fa914c19931a55bb262ec879c-Reviews.html
This paper gives minimax lower bound for the number of bits, B, needed to be communicated between m estimators, each of which has acessess to n iid samples, such that the estimation error is on par with the centralized case (with mn samples). This is a very interesting problem of theoretic and practical significance. In view of the fact that the minimax rate without communication constraints which does not admit a general solution, I do not expect a general solution for the minimal B. As the authors shows the conclusion seems problem-specific and can be quite pessimistic, meaning a large amount of communication overhead is necessary. While the proof of Thm 2 is quite trivial (a vanilla application of Fano's inequality), the proof of Thm 3 is quite interesting which relies on a type of strong data processing inequality for mutual information under some bounded density condition, which is reminiscient of differential privacy and Thm 1 in J. C. Duchi, M. I. Jordan and M. J. Wainwright (2013). Whether one need even bigger B in order to reach the rate at sample size mn - I wonder if there is any problem in multi-user information theory that is similar in spirit, where a joint decoder has access to rate-constrained coded messages and want to recover the original source with some fidelity.