Goto

Collaborating Authors

 basis coefficient


Reviews: Convergence rates of a partition based Bayesian multivariate density estimation method

Neural Information Processing Systems

Note: Below, I use [#M] for references in the main paper and [#S] for references in the supplement, since these are indexed differently. Summary: This paper proposes and analyzes a Bayesian approach to nonparametric density estimation. The proposed method is based on approximation by piecewise-constant functions over a binary partitioning of the unit cube, using a prior that decays with the size of the partition. The posterior distribution of the density is shown to concentrate around the true density f_0, at a rate depending on the smoothness r of f_0, a measure in terms of how well f_0 can be approximated by piecewise-constant functions over binary partitionings. Interestingly, the method automatically adapts to unknown r, and r can be related to more standard measures of smoothness, such as Holder continuity, bounded variation, and decay rate of Haar basis coefficients.


DAM: Towards A Foundation Model for Time Series Forecasting

arXiv.org Artificial Intelligence

It is challenging to scale time series forecasting models such that they forecast accurately for multiple distinct domains and datasets, all with potentially different underlying collection procedures (e.g., sample resolution), patterns (e.g., periodicity), and prediction requirements (e.g., reconstruction vs. forecasting). We call this general task universal forecasting. Existing methods usually assume that input data is regularly sampled, and they forecast to pre-determined horizons, resulting in failure to generalise outside of the scope of their training. We propose the DAM - a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time for forecasting to non-fixed horizons. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution, that enables an efficient global perspective of the underlying temporal dynamics while retaining focus on the recent history; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output, (3) the basis coefficients of a continuous function of time. We show that a single univariate DAM, trained on 25 time series datasets, either outperformed or closely matched existing SoTA models at multivariate long-term forecasting across 18 datasets, including 8 held-out for zero-shot transfer, even though these models were trained to specialise for each dataset-horizon combination. This single DAM excels at zero-shot transfer and very-long-term forecasting, performs well at imputation, is interpretable via basis function composition and attention, can be tuned for different inference-cost requirements, is robust to missing and irregularly sampled data {by design}.


Neural Networks for Scalar Input and Functional Output

arXiv.org Artificial Intelligence

The regression of a functional response on a set of scalar predictors can be a challenging task, especially if there is a large number of predictors, or the relationship between those predictors and the response is nonlinear. In this work, we propose a solution to this problem: a feed-forward neural network (NN) designed to predict a functional response using scalar inputs. First, we transform the functional response to a finite-dimensional representation and construct an NN that outputs this representation. Then, we propose to modify the output of an NN via the objective function and introduce different objective functions for network training. The proposed models are suited for both regularly and irregularly spaced data, and a roughness penalty can be further applied to control the smoothness of the predicted curve. The difficulty in implementing both those features lies in the definition of objective functions that can be back-propagated. In our experiments, we demonstrate that our model outperforms the conventional function-on-scalar regression model in multiple scenarios while computationally scaling better with the dimension of the predictors.


Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

arXiv.org Artificial Intelligence

We show that, although solution trajectories of gradient-based algorithms may vary depending on the learning task, they behave almost monotonically when projected onto an appropriate orthonormal function basis. Such projection gives rise to a basis function decomposition of the solution trajectory. Theoretically, we use our proposed basis function decomposition to establish the convergence of gradient descent (GD) on several representative learning tasks. In particular, we improve the convergence of GD on symmetric matrix factorization and provide a completely new convergence result for the orthogonal symmetric tensor decomposition. Empirically, we illustrate the promise of our proposed framework on realistic deep neural networks (DNNs) across different architectures, gradient-based solvers, and datasets. Our key finding is that gradient-based algorithms monotonically learn the coefficients of a particular orthonormal function basis of DNNs defined as the eigenvectors of the conjugate kernel after training.


Compressive-Sensing Data Reconstruction for Structural Health Monitoring: A Machine-Learning Approach

arXiv.org Machine Learning

Compressive sensing (CS) has been studied and applied in structural health monitoring for wireless data acquisition and transmission, structural modal identification, and spare damage identification. The key issue in CS is finding the optimal solution for sparse optimization. In the past years, many algorithms have been proposed in the field of applied mathematics. In this paper, we propose a machine-learning-based approach to solve the CS data-reconstruction problem. By treating a computation process as a data flow, the process of CS-based data reconstruction is formalized into a standard supervised-learning task. The prior knowledge, i.e., the basis matrix and the CS-sampled signals, are used as the input and the target of the network; the basis coefficient matrix is embedded as the parameters of a certain layer; the objective function of conventional compressive sensing is set as the loss function of the network. Regularized by l1-norm, these basis coefficients are optimized to reduce the error between the original CS-sampled signals and the masked reconstructed signals with a common optimization algorithm. Also, the proposed network can handle complex bases, such as a Fourier basis. Benefiting from the nature of a multi-neuron layer, multiple signal channels can be reconstructed simultaneously. Meanwhile, the disassembled use of a large-scale basis makes the method memory-efficient. A numerical example of multiple sinusoidal waves and an example of field-test wireless data from a suspension bridge are carried out to illustrate the data-reconstruction ability of the proposed approach. The results show that high reconstruction accuracy can be obtained by the machine learning-based approach. Also, the parameters of the network have clear meanings; the inference of the mapping between input and output is fully transparent, making the CS data reconstruction neural network interpretable.


Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

arXiv.org Machine Learning

The sum-product or belief propagation (BP) algorithm is a widely used message-passing technique for computing approximate marginals in graphical models. We introduce a new technique, called stochastic orthogonal series message-passing (SOSMP), for computing the BP fixed point in models with continuous random variables. It is based on a deterministic approximation of the messages via orthogonal series expansion, and a stochastic approximation via Monte Carlo estimates of the integral updates of the basis coefficients. We prove that the SOSMP iterates converge to a \delta-neighborhood of the unique BP fixed point for any tree-structured graph, and for any graphs with cycles in which the BP updates satisfy a contractivity condition. In addition, we demonstrate how to choose the number of basis coefficients as a function of the desired approximation accuracy \delta and smoothness of the compatibility functions. We illustrate our theory with both simulated examples and in application to optical flow estimation.


Learning Nonlinear Overcomplete Representations for Efficient Coding

Neural Information Processing Systems

We derive a learning algorithm for inferring an overcomplete basis by viewing it as probabilistic model of the observed data. Overcomplete bases allow for better approximation of the underlying statistical density. Using a Laplacian prior on the basis coefficients removes redundancy and leads to representations that are sparse and are a nonlinear function of the data. This can be viewed as a generalization of the technique of independent component analysis and provides a method for blind source separation of fewer mixtures than sources. We demonstrate the utility of overcomplete representations on natural speech and show that compared to the traditional Fourier basis the inferred representations potentially have much greater coding efficiency.


Learning Nonlinear Overcomplete Representations for Efficient Coding

Neural Information Processing Systems

We derive a learning algorithm for inferring an overcomplete basis by viewing it as probabilistic model of the observed data. Overcomplete bases allow for better approximation of the underlying statistical density. Using a Laplacian prior on the basis coefficients removes redundancy and leads to representations that are sparse and are a nonlinear function of the data. This can be viewed as a generalization of the technique of independent component analysis and provides a method for blind source separation of fewer mixtures than sources. We demonstrate the utility of overcomplete representations on natural speech and show that compared to the traditional Fourier basis the inferred representations potentially have much greater coding efficiency.