learning and generalization
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network, and connect it to the SGD theory of escaping saddle points.
Learning and Generalization in RNNs
Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs---addressing a crucial weakness in previous work. We illustrate our results on some regular language recognition problems.
Learning and generalization of robotic dual-arm manipulation of boxes from demonstrations via Gaussian Mixture Models (GMMs)
Lee, Qian Ying, Kulkarni, Suhas Raghavendra, Wong, Kenzhi Iskandar, Yang, Lin, Noronha, Bernardo, Wee, Yongjun, Hung, Tzu-Yi, Campolo, Domenico
Learning from demonstration (LfD) is an effective method to teach robots to move and manipulate objects in a human-like manner. This is especially true when dealing with complex robotic systems, such as those with dual arms employed for their improved payload capacity and manipulability. However, a key challenge is in expanding the robotic movements beyond the learned scenarios to adapt to minor and major variations from the specific demonstrations. In this work, we propose a learning and novel generalization approach that adapts the learned Gaussian Mixture Model (GMM)-parameterized policy derived from human demonstrations. Our method requires only a small number of human demonstrations and eliminates the need for a robotic system during the demonstration phase, which can significantly reduce both cost and time. The generalization process takes place directly in the parameter space, leveraging the lower-dimensional representation of GMM parameters. With only three parameters per Gaussian component, this process is computationally efficient and yields immediate results upon request. We validate our approach through real-world experiments involving a dual-arm robotic manipulation of boxes. Starting with just five demonstrations for a single task, our approach successfully generalizes to new unseen scenarios, including new target locations, orientations, and box sizes. These results highlight the practical applicability and scalability of our method for complex manipulations.
Reviews: Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
I thank the authors for their response. I understand that generalization is not the major contribution in this paper -- thanks for the note. I also appreciate the plot showing the numerical values of the weight norms for varying width. It is reassuring to know that these quantities do vary inversely with width for this setting. I think adding these sorts of plots to the appendix of the paper (with a bit more detailed experimentation and discussion) would be useful for the paper.
Learning and Generalization in RNNs
Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs---addressing a crucial weakness in previous work.
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples.
On the Generalization of Learned Structured Representations
Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.
- Asia (0.92)
- Europe > Germany > Baden-Württemberg (0.27)
- Europe > Denmark > Capital Region (0.27)
- North America > United States > Minnesota (0.27)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Energy > Oil & Gas (0.92)
- Leisure & Entertainment > Games > Computer Games (0.92)
- Education (0.67)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data
Li, Hongkang, Zhang, Shuai, Wang, Meng
This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.
- North America > United States > New York > Rensselaer County > Troy (0.05)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Allen-Zhu, Zeyuan, Li, Yuanzhi, Liang, Yingyu
The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network.