Goto

Collaborating Authors

 random parameter


Broad stochastic configuration residual learning system for norm-convergent universal approximation

Su, Han, Li, Zhongyan, Liu, Wanquan

arXiv.org Artificial Intelligence

Universal approximation serves as the foundation of neural network learning algorithms. However, some networks establish their universal approximation property by demonstrating that the iterative errors converge in probability measure rather than the more rigorous norm convergence, which makes the universal approximation property of randomized learning networks highly sensitive to random parameter selection, Broad residual learning system (BRLS), as a member of randomized learning models, also encounters this issue. We theoretically demonstrate the limitation of its universal approximation property, that is, the iterative errors do not satisfy norm convergence if the selection of random parameters is inappropriate and the convergence rate meets certain conditions. To address this issue, we propose the broad stochastic configuration residual learning system (BSCRLS) algorithm, which features a novel supervisory mechanism adaptively constraining the range settings of random parameters on the basis of BRLS framework, Furthermore, we prove the universal approximation theorem of BSCRLS based on the more stringent norm convergence. Three versions of incremental BSCRLS algorithms are presented to satisfy the application requirements of various network updates. Solar panels dust detection experiments are performed on publicly available dataset and compared with 13 deep and broad learning algorithms. Experimental results reveal the effectiveness and superiority of BSCRLS algorithms.


Automatically Testing Functional Properties of Code Translation Models

Eniser, Hasan Ferit, Wüstholz, Valentin, Christakis, Maria

arXiv.org Artificial Intelligence

Large language models are becoming increasingly practical for translating code across programming languages, a process known as $transpiling$. Even though automated transpilation significantly boosts developer productivity, a key concern is whether the generated code is correct. Existing work initially used manually crafted test suites to test the translations of a small corpus of programs; these test suites were later automated. In contrast, we devise the first approach for automated, functional, property-based testing of code translation models. Our general, user-provided specifications about the transpiled code capture a range of properties, from purely syntactic to purely semantic ones. As shown by our experiments, this approach is very effective in detecting property violations in popular code translation models, and therefore, in evaluating model quality with respect to given properties. We also go a step further and explore the usage scenario where a user simply aims to obtain a correct translation of some code with respect to certain properties without necessarily being concerned about the overall quality of the model. To this purpose, we develop the first property-guided search procedure for code translation models, where a model is repeatedly queried with slightly different parameters to produce alternative and potentially more correct translations. Our results show that this search procedure helps to obtain significantly better code translations.


Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features

Lee, Hangbin, Ha, Il Do, Hwang, Changha, Lee, Youngjo

arXiv.org Machine Learning

Deep neural networks (DNNs), which have been proposed to capture the nonlinear relationship between input and output variables (LeCun et al., 2015; Goodfellow et al., 2016), provide outstanding marginal predictions for independent outputs. However, in practical applications, it is common to encounter correlated data with high-cardinality categorical features, which can pose challenges for DNNs. While the traditional DNN framework overlooks such correlation, random effect models have emerged in statistics to make subject-specific predictions for correlated data. Lee and Nelder (1996) proposed hierarchical generalized linear models (HGLMs), which allow the incorporation of random effects from an arbitrary conjugate distribution of generalized linear model (GLM) family. Both DNNs and random effect models have been successful in improving prediction accuracy of linear models but in different ways. Recently, there has been a rising interest in combining these two extensions. Simchoni and Rosset (2021, 2023) proposed the linear mixed model neural network for continuous (Gaussian) outputs with Gaussian random effects, which allow explicit expressions for likelihoods. Lee and Lee (2023) introduced the hierarchical likelihood (h-likelihood) approach, as an extension of classical likelihood for Gaussian outputs, which provides an efficient likelihood-based procedure. For non-Gaussian (discrete) outputs, Tran et al. (2020) proposed a Bayesian approach for DNNs with normal random effects using the variational approximation method (Bishop and Nasrabadi, 2006; Blei


Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

Li, Deyue

arXiv.org Artificial Intelligence

Linear Quadratic (LQ) control problem for discrete time with random parameters whose study goes back to Kalman [12] finds applications in a wide range of practical problems, such as random sampling of a diffusion process in digital control [17], sampling of a system with noise caused [6] and economic systems [1]. Consequently, extensive results has been carried out in this area [6, 2, 4, 15, 3]. However, the literatures cited above assumes a priori knowledge of model parameters, which is unrealistic in many practical scenarios. Therefore, solving such problem without statistical information of model parameters are of great importance from both theoretical and practical perspectives. Recent years have witnessed a huge growth in learning approaches, among which the reinforcement learning (RL) method has garnered a great deal of attention from researchers [8, 18, 9, 10, 7, 14]. There are two categories of RL-methods: the model-based RL and the model-free RL. The model-based RL approach estimates the transition dynamics by observing or conducting experiments and then designs the control policy using the estimated parameters [16, 5].


Is the Number of Trainable Parameters All That Actually Matters?

Chatelain, Amélie, Djeghri, Amine, Hesslow, Daniel, Launay, Julien, Poli, Iacopo

arXiv.org Machine Learning

Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model size, and autoregressive modeling loss. The validity of these simple power laws across orders of magnitude in model scale provides compelling evidence that larger models are also more capable models. However, scaling up models under the constraints of hardware and infrastructure is no easy feat, and rapidly becomes a hard and expensive engineering problem. We investigate ways to tentatively cheat scaling laws, and train larger models for cheaper. We emulate an increase in effective parameters, using efficient approximations: either by doping the models with frozen random parameters, or by using fast structured transforms in place of dense linear layers. We find that the scaling relationship between test loss and compute depends only on the actual number of trainable parameters; scaling laws cannot be deceived by spurious parameters.


RODE-Net: Learning Ordinary Differential Equations with Randomness from Data

Liu, Junyu, Long, Zichao, Wang, Ranran, Sun, Jie, Dong, Bin

arXiv.org Machine Learning

Random ordinary differential equations (RODEs), i.e. ODEs with random parameters, are often used to model complex dynamics. Most existing methods to identify unknown governing RODEs from observed data often rely on strong prior knowledge. Extracting the governing equations from data with less prior knowledge remains a great challenge. In this paper, we propose a deep neural network, called RODE-Net, to tackle such challenge by fitting a symbolic expression of the differential equation and the distribution of parameters simultaneously. To train the RODE-Net, we first estimate the parameters of the unknown RODE using the symbolic networks \cite{long2019pde} by solving a set of deterministic inverse problems based on the measured data, and use a generative adversarial network (GAN) to estimate the true distribution of the RODE's parameters. Then, we use the trained GAN as a regularization to further improve the estimation of the ODE's parameters. The two steps are operated alternatively. Numerical results show that the proposed RODE-Net can well estimate the distribution of model parameters using simulated data and can make reliable predictions. It is worth noting that, GAN serves as a data driven regularization in RODE-Net and is more effective than the $\ell_1$ based regularization that is often used in system identifications.


Are Direct Links Necessary in RVFL NNs for Regression?

Dudek, Grzegorz

arXiv.org Machine Learning

A random vector functional link network (RVFL) is widely used as a universal approximator for classification and regression problems. The big advantage of RVFL is fast training without backpropagation. This is because the weights and biases of hidden nodes are selected randomly and stay untrained. Recently, alternative architectures with randomized learning are developed which differ from RVFL in that they have no direct links and a bias term in the output layer. In this study, we investigate the effect of direct links and output node bias on the regression performance of RVFL. For generating random parameters of hidden nodes we use the classical method and two new methods recently proposed in the literature. We test the RVFL performance on several function approximation problems with target functions of different nature: nonlinear, nonlinear with strong fluctuations, nonlinear with linear component and linear. Surprisingly, we found that the direct links and output node bias do not play an important role in improving RVFL accuracy for typical nonlinear regression problems. Keywords: Random vector functional link network · Neural networks with random hidden nodes · Randomized learning algorithms.


A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks

Dudek, Grzegorz

arXiv.org Machine Learning

Feedforward neural networks with random hidden nodes suffer from a problem with the generation of random weights and biases as these are difficult to set optimally to obtain a good projection space. Typically, random parameters are drawn from an interval which is fixed before or adapted during the learning process. Due to the different functions of the weights and biases, selecting them both from the same interval is not a good idea. Recently more sophisticated methods of random parameters generation have been developed, such as the data-driven method proposed in \cite{Anon19}, where the sigmoids are placed in randomly selected regions of the input space and then their slopes are adjusted to the local fluctuations of the target function. In this work, we propose an extended version of this method, which constructs iteratively the network architecture. This method successively generates new hidden nodes and accepts them if the training error decreases significantly. The threshold of acceptance is adapted to the current training stage. At the beginning of the training process only those nodes which lead to the largest error reduction are accepted. Then, the threshold is reduced by half to accept those nodes which model the target function details more accurately. This leads to faster convergence and more compact network architecture, as it includes only "significant" neurons. Several application examples are given which confirm this thesis.


Improving Randomized Learning of Feedforward Neural Networks by Appropriate Generation of Random Parameters

Dudek, Grzegorz

arXiv.org Machine Learning

In this work, a method of random parameters generation for randomized learning of a single-hidden-layer feedforward neural network is proposed. The method firstly, randomly selects the slope angles of the hidden neurons activation functions from an interval adjusted to the target function, then randomly rotates the activation functions, and finally distributes them across the input space. For complex target functions the proposed method gives better results than the approach commonly used in practice, where the random parameters are selected from the fixed interval. This is because it introduces the steepest fragments of the activation functions into the input hypercube, avoiding their saturation fragments. Keywords: Function approximation · Feedforward neural networks · Neural networks with random hidden nodes · Randomized learning algorithms. 1 Introduction Feedforward neural networks (FNNs) learn from data by iteratively tuning their parameters, weights and biases, using some form of gradient descent method.


Deep Stacked Stochastic Configuration Networks for Non-Stationary Data Streams

Pratama, Mahardhika, Wang, Dianhui

arXiv.org Machine Learning

The concept of stochastic configuration networks (SCNs) others a solid framework for fast implementation of feedforward neural networks through randomized learning. Unlike conventional randomized approaches, SCNs provide an avenue to select appropriate scope of random parameters to ensure the universal approximation property. In this paper, a deep version of stochastic configuration networks, namely deep stacked stochastic configuration network (DSSCN), is proposed for modeling non-stationary data streams. As an extension of evolving stochastic connfiguration networks (eSCNs), this work contributes a way to grow and shrink the structure of deep stochastic configuration networks autonomously from data streams. The performance of DSSCN is evaluated by six benchmark datasets. Simulation results, compared with prominent data stream algorithms, show that the proposed method is capable of achieving comparable accuracy and evolving compact and parsimonious deep stacked network architecture.