Goto

Collaborating Authors

 teacher-student


Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

Akiyama, Shunta, Suzuki, Taiji

arXiv.org Machine Learning

While deep learning has outperformed other methods for various tasks, theoretical frameworks that explain its reason have not been fully established. To address this issue, we investigate the excess risk of two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. Especially, we consider the student network that has the same width as the teacher network and is trained in two phases: first by noisy gradient descent and then by the vanilla gradient descent. Our result shows that the student network provably reaches a near-global optimal solution and outperforms any kernel methods estimator (more generally, linear estimators), including neural tangent kernel approach, random feature model, and other kernel methods, in a sense of the minimax optimal rate. The key concept inducing this superiority is the non-convexity of the neural network models. Even though the loss landscape is highly non-convex, the student network adaptively learns the teacher neurons.


On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Akiyama, Shunta, Suzuki, Taiji

arXiv.org Machine Learning

Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically. In this paper, we explore theoretical analysis on training two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. We show that with a specific regularization and sufficient over-parameterization, the student network can identify the parameters of the teacher network with high probability via gradient descent with a norm dependent stepsize even though the objective function is highly non-convex. The key theoretical tool is the measure representation of the neural networks and a novel application of a dual certificate argument for sparse estimation on a measure space. We analyze the global minima and global convergence property in the measure space.


Understanding Robustness in Teacher-Student Setting: A New Perspective

Yang, Zhuolin, Chen, Zhaoxi, Cai, Tiffany, Chen, Xinyun, Li, Bo, Tian, Yuandong

arXiv.org Artificial Intelligence

Adversarial examples have appeared as a ubiquitous property of machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Such examples provide a way to assess the robustness of machine learning models as well as a proxy for understanding the model training process. Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness (e.g. adversarial training). While they mostly focus on models trained on datasets with predefined labels, we leverage the teacher-student framework and assume a teacher model, or oracle, to provide the labels for given instances. We extend Tian (2019) in the case of low-rank input data and show that student specialization (trained student neuron is highly correlated with certain teacher neuron at the same layer) still happens within the input subspace, but the teacher and student nodes could differ wildly out of the data subspace, which we conjecture leads to adversarial examples. Extensive experiments show that student specialization correlates strongly with model robustness in different scenarios, including student trained via standard training, adversarial training, confidence-calibrated adversarial training, and training with robust feature dataset. Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.


An Advising Framework for Multiagent Reinforcement Learning Systems

Silva, Felipe Leno da (Escola Politécnica da Universidade de São Paulo) | Glatt, Ruben (Escola Politécnica da Universidade de São Paulo) | Costa, Anna Helena Reali (Escola Politécnica da Universidade de São Paulo)

AAAI Conferences

Reinforcement Learning has long been employed to solve sequential decision-making problems with minimal input data. However, the classical approach requires a long time to learn a suitable policy, especially in Multiagent Systems. The teacher-student framework proposes to mitigate this problem by integrating an advising procedure in the learning process, in which an experienced agent (human or not) can advise a student to guide her exploration. However, the teacher is assumed to be an expert in the learning task. We here propose an advising framework where multiple agents advise each other while learning in a shared environment, and the advisor is not expected to necessarily act optimally. Our experiments in a simulated Robot Soccer environment show that the learning process is improved by incorporating this kind of advice.