b-spline basis function
Functional Bayesian Additive Regression Trees with Shape Constraints
Cao, Jiahao, He, Shiyuan, Zhang, Bohai
Motivated by the great success of Bayesian additive regression trees (BART) on regression, we propose a nonparametric Bayesian approach for the function-on-scalar regression problem, termed as Functional BART (FBART). Utilizing spline-based function representation and tree-based domain partition model, FBART offers great flexibility in characterizing the complex and heterogeneous relationship between the response curve and scalar covariates. We devise a tailored Bayesian backfitting algorithm for estimating the parameters in the FBART model. Furthermore, we introduce an FBART model with shape constraints on the response curve, enhancing estimation and prediction performance when prior shape information of response curves is available. By incorporating a shape-constrained prior, we ensure that the posterior samples of the response curve satisfy the required shape constraints (e.g., monotonicity and/or convexity). Our proposed FBART model and its shape-constrained version are the new advances of BART models for functional data. Under certain regularity conditions, we derive the posterior convergence results for both FBART and its shape-constrained version. Finally, the superiority of the proposed methods over other competitive counterparts is validated through simulation experiments under various settings and analyses of two real datasets.
Local Control Networks (LCNs): Optimizing Flexibility in Neural Network Data Pattern Capture
Nguyen, Hy, Pham, Duy Khoa, Thudumu, Srikanth, Du, Hung, Vasa, Rajesh, Mouzakis, Kon
The widespread use of Multi-layer perceptrons (MLPs) often relies on a fixed activation function (e.g., ReLU, Sigmoid, Tanh) for all nodes within the hidden layers. While effective in many scenarios, this uniformity may limit the networks ability to capture complex data patterns. We argue that employing the same activation function at every node is suboptimal and propose leveraging different activation functions at each node to increase flexibility and adaptability. To achieve this, we introduce Local Control Networks (LCNs), which leverage B-spline functions to enable distinct activation curves at each node. Our mathematical analysis demonstrates the properties and benefits of LCNs over conventional MLPs. In addition, we demonstrate that more complex architectures, such as Kolmogorov-Arnold Networks (KANs), are unnecessary in certain scenarios, and LCNs can be a more efficient alternative. Empirical experiments on various benchmarks and datasets validate our theoretical findings. In computer vision tasks, LCNs achieve marginal improvements over MLPs and outperform KANs by approximately 5\%, while also being more computationally efficient than KANs. In basic machine learning tasks, LCNs show a 1\% improvement over MLPs and a 0.6\% improvement over KANs. For symbolic formula representation tasks, LCNs perform on par with KANs, with both architectures outperforming MLPs. Our findings suggest that diverse activations at the node level can lead to improved performance and efficiency.
Enriched Functional Tree-Based Classifiers: A Novel Approach Leveraging Derivatives and Geometric Features
Maturo, Fabrizio, Porreca, Annamaria
The positioning of this research falls within the scalar-on-function classification literature, a field of significant interest across various domains, particularly in statistics, mathematics, and computer science. This study introduces an advanced methodology for supervised classification by integrating Functional Data Analysis (FDA) with tree-based ensemble techniques for classifying high-dimensional time series. The proposed framework, Enriched Functional Tree-Based Classifiers (EFTCs), leverages derivative and geometric features, benefiting from the diversity inherent in ensemble methods to further enhance predictive performance and reduce variance. While our approach has been tested on the enrichment of Functional Classification Trees (FCTs), Functional K-NN (FKNN), Functional Random Forest (FRF), Functional XGBoost (FXGB), and Functional LightGBM (FLGBM), it could be extended to other tree-based and non-tree-based classifiers, with appropriate considerations emerging from this investigation. Through extensive experimental evaluations on seven real-world datasets and six simulated scenarios, this proposal demonstrates fascinating improvements over traditional approaches, providing new insights into the application of FDA in complex, high-dimensional learning problems.
Mapping-to-Parameter Nonlinear Functional Regression with Novel B-spline Free Knot Placement Algorithm
Shi, Chengdong, Tseng, Ching-Hsun, Zhao, Wei, Zeng, Xiao-Jun
We propose a novel approach to nonlinear functional regression, called the Mapping-to-Parameter function model, which addresses complex and nonlinear functional regression problems in parameter space by employing any supervised learning technique. Central to this model is the mapping of function data from an infinite-dimensional function space to a finite-dimensional parameter space. This is accomplished by concurrently approximating multiple functions with a common set of B-spline basis functions by any chosen order, with their knot distribution determined by the Iterative Local Placement Algorithm, a newly proposed free knot placement algorithm. In contrast to the conventional equidistant knot placement strategy that uniformly distributes knot locations based on a predefined number of knots, our proposed algorithms determine knot location according to the local complexity of the input or output functions. The performance of our knot placement algorithms is shown to be robust in both single-function approximation and multiple-function approximation contexts. Furthermore, the effectiveness and advantage of the proposed prediction model in handling both function-on-scalar regression and function-on-function regression problems are demonstrated through several real data applications, in comparison with four groups of state-of-the-art methods.
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
Zhang, Kaiqi, Zhang, Zixuan, Chen, Minshuo, Wang, Mengdi, Zhao, Tuo, Wang, Yu-Xiang
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.
Actually Sparse Variational Gaussian Processes
Cunningham, Harry Jake, de Souza, Daniel Augusto, Takao, So, van der Wilk, Mark, Deisenroth, Marc Peter
Gaussian processes (GPs) are typically criticised for their unfavourable scaling in both computational and memory requirements. For large datasets, sparse GPs reduce these demands by conditioning on a small set of inducing variables designed to summarise the data. In practice however, for large datasets requiring many inducing variables, such as low-lengthscale spatial data, even sparse GPs can become computationally expensive, limited by the number of inducing variables one can use. In this work, we propose a new class of inter-domain variational GP, constructed by projecting a GP onto a set of compactly supported B-spline basis functions. The key benefit of our approach is that the compact support of the B-spline basis functions admits the use of sparse linear algebra to significantly speed up matrix operations and drastically reduce the memory footprint. This allows us to very efficiently model fast-varying spatial phenomena with tens of thousands of inducing variables, where previous approaches failed.