Goto

Collaborating Authors

 Wang, Ruinan


Grouped Sequential Optimization Strategy -- the Application of Hyperparameter Importance Assessment in Deep Learning

arXiv.org Artificial Intelligence

In recent years, the rapid advancement of deep learning has led to significant breakthroughs across a wide range of applications, from computer vision to natural language processing, where hyperparameter optimization (HPO) has become increasingly vital in constructing models that achieve optimal performance. As the demand for HPO has been growing, the computational and time costs associated with it have become a significant bottleneck [1]. In this context, Hyperparameter Importance Assessment (HIA) has emerged as a promising solution. By evaluating the importance weights of individual hyperparameters and their combinations within specific models, HIA provides valuable insights into which hyperparameters most significantly impact model performance [2]. With this understanding, deep learning practitioners can focus on optimizing only those hyperparameters that have a more pronounced effect on performance. For less critical hyperparameters, users can reduce the search space during optimization or even fix them at certain values, thereby saving time in the model optimization process [3]. Although there has been considerable exploration of HIA, most existing studies have primarily focused on introducing new HIA methods or determining the importance rankings of hyperparameters for specific models within certain application scenarios. However, there has been limited exploration of how these insights can be strategically applied to enhance the efficiency of the optimization process. To address the challenges in the current research landscape, this paper aims to use Convolutional Neural Networks (CNNs) as the research case to introduce HIA into the deep learning pipeline, demonstrating that the insights gained from HIA can effectively enhance the efficiency of hyper-Second Conference on Parsimony and Learning (CPAL 2025).


Efficient Hyperparameter Importance Assessment for CNNs

arXiv.org Artificial Intelligence

Hyperparameter selection is an essential aspect of the machine learning pipeline, profoundly impacting models' robustness, stability, and generalization capabilities. Given the complex hyperparameter spaces associated with Neural Networks and the constraints of computational resources and time, optimizing all hyperparameters becomes impractical. In this context, leveraging hyperparameter importance assessment (HIA) can provide valuable guidance by narrowing down the search space. This enables machine learning practitioners to focus their optimization efforts on the hyperparameters with the most significant impact on model performance while conserving time and resources. This paper aims to quantify the importance weights of some hyperparameters in Convolutional Neural Networks (CNNs) with an algorithm called N-RReliefF, laying the groundwork for applying HIA methodologies in the Deep Learning field. We conduct an extensive study by training over ten thousand CNN models across ten popular image classification datasets, thereby acquiring a comprehensive dataset containing hyperparameter configuration instances and their corresponding performance metrics. It is demonstrated that among the investigated hyperparameters, the top five important hyperparameters of the CNN model are the number of convolutional layers, learning rate, dropout rate, optimizer and epoch.


Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations

arXiv.org Artificial Intelligence

Multimodal demonstrations provide robots with an abundance of information to make sense of the world. However, such abundance may not always lead to good performance when it comes to learning sensorimotor control policies from human demonstrations. Extraneous data modalities can lead to state over-specification, where the state contains modalities that are not only useless for decision-making but also can change data distribution across environments. State over-specification leads to issues such as the learned policy not generalizing outside of the training data distribution. In this work, we propose Masked Imitation Learning (MIL) to address state over-specification by selectively using informative modalities. Specifically, we design a masked policy network with a binary mask to block certain modalities. We develop a bi-level optimization algorithm that learns this mask to accurately filter over-specified modalities. We demonstrate empirically that MIL outperforms baseline algorithms in simulated domains including MuJoCo and a robot arm environment using the Robomimic dataset, and effectively recovers the environment-invariant modalities on a multimodal dataset collected on a real robot. Our project website presents supplemental details and videos of our results at: https://tinyurl.com/masked-il