representation power
accordingly to incorporate the comments. Reviewer # 1: (Stepsize and preset T.) Following the current analysis, for a general stepsize η
We appreciate the valuable comments and positive feedback from the reviewers. Without averaging the iterates, no convergence rate is available. In this paper we consider neural network with one hidden layer. In particular, Proposition 4.7 shows that neural TD attains the global minimum of MSBE (without the We will revise the "without loss of generality" claim in the revision. We will clarify this notation in the revision.
Inverse M-Kernels for Linear Universal Approximators of Non-Negative Functions
Kernel methods are widely utilized in machine learning field to learn, from training data, a latent function in a reproducing kernel Hilbert space. It is well known that the approximator thus obtained usually achieves a linear representation, which brings various computational benefits, while maintaining great representation power (i.e., universal approximation). However, when non-negativity constraints are imposed on the function's outputs, the literature usually takes the kernel method-based approximators as offering linear representations at the expense of limited model flexibility or good representation power by allowing for their nonlinear forms. The main contribution of this paper is to derive a sufficient condition for a positive definite kernel so that it may construct flexible and linear approximators of non-negative functions. We call a kernel function that offers these attributes an inverse M-kernel; it is a generalization of the inverse M-matrix. Furthermore, we show that for a one-dimensional input space, universal exponential/Abel kernels are inverse M-kernels and construct linear universal approxima-tors of non-negative functions. To the best of our knowledge, it is the first time that the existence of linear universal approximators of non-negative functions has been elucidated. We confirm the effectiveness of our results by experiments on the problems of non-negativity-constrained regression, density estimation, and intensity estimation. Finally, we discuss issues and perspectives on multi-dimensional input settings.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile (0.04)
- North America > United States > Virginia (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- South America > Brazil (0.04)
- North America > Canada (0.04)
- (2 more...)
Re: Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators (ID=1064)
Re: Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators (ID=1064). We thank the reviewers for reviewing our work. We will update the paper based on the suggestions. On what occasion would the diffeomorphic universality results be useful other than distribution approximation? Thank you for pointing out the missing references.
Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology
To deepen our understanding of graph neural networks, we investigate the representation power of Graph Convolutional Networks (GCN) through the looking glass of graph moments, a key property of graph topology encoding path of various lengths. We find that GCNs are rather restrictive in learning graph moments. Without careful design, GCNs can fail miserably even with multiple layers and nonlinear activation functions. We analyze theoretically the expressiveness of GCNs, arriving at a modular GCN design, using different propagation rules. Our modular design is capable of distinguishing graphs from different graph generation models for surprisingly small graphs, a notoriously difficult problem in network science. Our investigation suggests that, depth is much more influential than width and deeper GCNs are more capable of learning higher order graph moments. Additionally, combining GCN modules with different propagation rules is critical to the representation power of GCNs.
On the equivalence between graph isomorphism testing and function approximation with GNNs
Graph neural networks (GNNs) have achieved lots of success on graph-structured data. In light of this, there has been increasing interest in studying their representation power. One line of work focuses on the universal approximation of permutation-invariant functions by certain classes of GNNs, and another demonstrates the limitation of GNNs via graph isomorphism tests. Our work connects these two perspectives and proves their equivalence. We further develop a framework of the representation power of GNNs with the language of sigma-algebra, which incorporates both viewpoints. Using this framework, we compare the expressive power of different classes of GNNs as well as other methods on graphs. In particular, we prove that order-2 Graph G-invariant networks fail to distinguish non-isomorphic regular graphs with the same degree. We then extend them to a new architecture, Ring-GNN, which succeeds in distinguishing these graphs as well as for tasks on real-world datasets.