Goto

Collaborating Authors

 validation data


L2T-DLN: Learning to Teach with Dynamic Loss Network

Neural Information Processing Systems

With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.



Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Neural Information Processing Systems

The (gradient-based) bilevel programming framework is widely used in hyperparameter optimization and has achieved excellent performance empirically. Previous theoretical work mainly focuses on its optimization properties, while leaving the analysis on generalization largely open. This paper attempts to address the issue by presenting an expectation bound w.r.t. the validation set based on uniform stability. Our results can explain some mysterious behaviours of the bilevel programming in practice, for instance, overfitting to the validation set. We also present an expectation bound for the classical cross-validation algorithm. Our results suggest that gradient-based algorithms can be better than cross-validation under certain conditions in a theoretical perspective. Furthermore, we prove that regularization terms in both the outer and inner levels can relieve the overfitting problem in gradient-based algorithms. In experiments on feature learning and data reweighting for noisy labels, we corroborate our theoretical findings.



Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals

arXiv.org Machine Learning

Damped sinusoidal oscillations are widely observed in many physical systems, and their analysis provides access to underlying physical properties. However, parameter estimation becomes difficult when the signal decays rapidly, multiple components are superposed, and observational noise is present. In this study, we develop an autoencoder-based method that uses the latent space to estimate the frequency, phase, decay time, and amplitude of each component in noisy multi-component damped sinusoidal signals. We investigate multi-component cases under Gaussian-distribution training and further examine the effect of the training-data distribution through comparisons between Gaussian and uniform training. The performance is evaluated through waveform reconstruction and parameter-estimation accuracy. We find that the proposed method can estimate the parameters with high accuracy even in challenging setups, such as those involving a subdominant component or nearly opposite-phase components, while remaining reasonably robust when the training distribution is less informative. This demonstrates its potential as a tool for analyzing short-duration, noisy signals.


An Empirical Bayes Approach to Optimizing Machine Learning Algorithms

Neural Information Processing Systems

There is rapidly growing interest in using Bayesian optimization to tune model and inference hyperparameters for machine learning algorithms that take a long time to run. For example, Spearmint is a popular software package for selecting the optimal number of layers and learning rate in neural networks. But given that there is uncertainty about which hyperparameters give the best predictive performance, and given that fitting a model for each choice of hyperparameters is costly, it is arguably wasteful to throw away all but the best result, as per Bayesian optimization. A related issue is the danger of overfitting the validation data when optimizing many hyperparameters. In this paper, we consider an alternative approach that uses more samples from the hyperparameter selection procedure to average over the uncertainty in model hyperparameters. The resulting approach, empirical Bayes for hyperparameter averaging (EB-Hyp) predicts held-out data better than Bayesian optimization in two experiments on latent Dirichlet allocation and deep latent Gaussian models. EB-Hyp suggests a simpler approach to evaluating and deploying machine learning algorithms that does not require a separate validation data set and hyperparameter selection procedure.





Supplementary Material

Neural Information Processing Systems

The supplementary material is organized as follows. We give details of the definitions and notation in Section B.1 . Then, we provide the technical details of the lower bound (Lemma 3.3). In Section D.4 we provide insights into auto-labeling using This suggests, in these settings auto-labeling using active learning followed by selective classification is expected to work well. This idea is captured by the Chow's excess risk [ Nevertheless, it would be interesting future work to explore the connections between auto-labeling and active learning with abstention.