Goto

Collaborating Authors

 raw input





Towards Understanding Hierarchical Learning: Benefits of Neural Representations

Neural Information Processing Systems

Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to ``shallow learners'' such as kernels. In this work, we demonstrate that intermediate \emph{neural representations} add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\widetilde{O}(d^{\ceil{p/2}})$ samples, while the best-known sample complexity upper bound for the raw input is $\widetilde{O}(d^{p-1})$. We contrast our result with a lower bound showing that neural representations do not improve over the raw input (in the infinite width limit), when the trainable network is instead a neural tangent kernel. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.



MA ViL: Masked Audio-Video Learners Po-Y ao Huang

Neural Information Processing Systems

Empirically, MA ViL achieves state-of-the-art audio-video classification performance on AudioSet (53.3 mAP) and VGGSound (67.1% accuracy), surpassing recent self-supervised models and supervised models that utilize external labeled data.



Reviewer # 1: 2 Q: Regarding the notation B

Neural Information Processing Systems

We appreciate all the reviewers' valuable comments. Here is our response to the major questions raised by the reviewers. (as in Theorem 2). We used null O () to hide dependency on any log factor in the theorem statement. Q: How does whitening affect the proof in Section 4.1?


Towards Understanding Hierarchical Learning: Benefits of Neural Representations

Neural Information Processing Systems

Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to shallow learners'' such as kernels. In this work, we demonstrate that intermediate \emph{neural representations} add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree- p polynomial ( p \geq 4) in d dimension, neural representation requires only \widetilde{O}(d {\ceil{p/2}}) samples, while the best-known sample complexity upper bound for the raw input is \widetilde{O}(d {p-1}) .


Reviews: Superposition of many models into one

Neural Information Processing Systems

I appreciate that the authors re-implemented the CIFAR benchmark I had requested. However, I'm still unconvinced of the significance or the originality of the proposed approach. For me, two fundamental issues remain: 1) The proposed approach is conceptually very similar to the masking proposed in Masse et al. (2018) that I mentioned in my review. The only difference is essentially masking with a {1,-1} vector vs. masking with a {0,1} vector. For sufficiently sparse masks (as used in Masse et al.), the latter approach will also produce largely non-overlapping feature subsets for different tasks, so I don't see this as a huge difference.