Goto

Collaborating Authors

 integral representation







Estimating properties of a homogeneous bounded soil using machine learning models

Kalimeris, Konstantinos, Mindrinos, Leonidas, Pallikarakis, Nikolaos

arXiv.org Artificial Intelligence

This work focuses on estimating soil properties from water moisture measurements. We consider simulated data generated by solving the initial-boundary value problem governing vertical infiltration in a homogeneous, bounded soil profile, with the usage of the Fokas method. To address the parameter identification problem, which is formulated as a two-output regression task, we explore various machine learning models. The performance of each model is assessed under different data conditions: full, noisy, and limited. Overall, the prediction of diffusivity $D$ tends to be more accurate than that of hydraulic conductivity $K.$ Among the models considered, Support Vector Machines (SVMs) and Neural Networks (NNs) demonstrate the highest robustness, achieving near-perfect accuracy and minimal errors.


Curse of Dimensionality in Neural Network Optimization

Na, Sanghoon, Yang, Haizhao

arXiv.org Machine Learning

The curse of dimensionality in neural network optimization under the mean-field regime is studied. It is demonstrated that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-\frac{4r}{d-2r}}$, where $t$ is an analog of the total number of optimization iterations. This result highlights the presence of the curse of dimensionality in the optimization computation required to achieve a desired accuracy. Instead of analyzing parameter evolution directly, the training dynamics are examined through the evolution of the parameter distribution under the 2-Wasserstein gradient flow. Furthermore, it is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed, where the Lipschitz constant in $[-x,x]$ is bounded by $O(x^\delta)$ for any $x \in \mathbb{R}$. In this scenario, the population risk is shown to decay at a rate no faster than $t^{-\frac{(4+2\delta)r}{d-2r}}$. To the best of our knowledge, this work is the first to analyze the impact of function smoothness on the curse of dimensionality in neural network optimization theory.


Generalization Properties of Learning with Random Features

Alessandro Rudi, Lorenzo Rosasco

Neural Information Processing Systems

We study the generalization properties of ridge regression with random features in the statistical learning framework. We show for the first time that O(1/ n) learning bounds can be achieved with only O( n log n) random features rather than O(n) as suggested by previous results. Further, we prove faster learning rates and show that they might require more random features, unless they are sampled according to a possibly problem dependent distribution. Our results shed light on the statistical computational trade-offs in large scale kernelized learning, showing the potential effectiveness of random features in reducing the computational complexity while keeping optimal generalization properties.


Aspects of importance sampling in parameter selection for neural networks using ridgelet transform

Homma, Hikaru, Ohkubo, Jun

arXiv.org Artificial Intelligence

The choice of parameters in neural networks is crucial in the performance, and an oracle distribution derived from the ridgelet transform enables us to obtain suitable initial parameters. In other words, the distribution of parameters is connected to the integral representation of target functions. The oracle distribution allows us to avoid the conventional backpropagation learning process; only a linear regression is enough to construct the neural network in simple cases. This study provides a new look at the oracle distributions and ridgelet transforms, i.e., an aspect of importance sampling. In addition, we propose extensions of the parameter sampling methods. We demonstrate the aspect of importance sampling and the proposed sampling algorithms via one-dimensional and high-dimensional examples; the results imply that the magnitude of weight parameters could be more crucial than the intercept parameters.


Constructive Universal Approximation Theorems for Deep Joint-Equivariant Networks by Schur's Lemma

Sonoda, Sho, Hashimoto, Yuka, Ishikawa, Isao, Ikeda, Masahiro

arXiv.org Machine Learning

We present a unified constructive universal approximation theorem covering a wide range of learning machines including both shallow and deep neural networks based on the group representation theory. Constructive here means that the distribution of parameters is given in a closed-form expression (called the ridgelet transform). Contrary to the case of shallow models, expressive power analysis of deep models has been conducted in a case-by-case manner. Recently, Sonoda et al. (2023a,b) developed a systematic method to show a constructive approximation theorem from scalar-valued joint-group-invariant feature maps, covering a formal deep network. However, each hidden layer was formalized as an abstract group action, so it was not possible to cover real deep networks defined by composites of nonlinear activation function. In this study, we extend the method for vector-valued joint-group-equivariant feature maps, so to cover such real networks.