AITopics | Watanabe, Shuhei

Collaborating Authors

Watanabe, Shuhei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process

Watanabe, Shuhei

arXiv.org Machine LearningJan-14-2025

Gaussian process (GP) is arguably one of the most widely used machine learning algorithms in practice. One of its prominent applications is Bayesian optimization (BO). Although the vanilla GP itself is already a powerful tool for BO, it is often beneficial to be able to consider the dependencies of multiple outputs. To do so, Multi-task GP (MTGP) is formulated, but it is not trivial to fully understand the derivations of its formulations and their gradients from the previous literature. This paper serves friendly derivations of the MTGP formulations and their gradients.

artificial intelligence, machine learning, optimization, (14 more...)

arXiv.org Machine Learning

2501.07964

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)

Add feedback

Derivation of Closed Form of Expected Improvement for Gaussian Process Trained on Log-Transformed Objective

Watanabe, Shuhei

arXiv.org Machine LearningNov-27-2024

Expected Improvement (EI) is arguably the most widely used acquisition function in Bayesian optimization. However, it is often challenging to enhance the performance with EI due to its sensitivity to numerical precision. Previously, Hutter et al. (2009) tackled this problem by using Gaussian process trained on the log-transformed objective function and it was reported that this trick improves the predictive accuracy of GP, leading to substantially better performance. Although Hutter et al. (2009) offered the closed form of their EI, its intermediate derivation has not been provided so far. In this paper, we give a friendly derivation of their proposition.

artificial intelligence, machine learning, optimization, (14 more...)

arXiv.org Machine Learning

2411.18095

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks

Watanabe, Shuhei, Mallik, Neeratyoy, Bergman, Edward, Hutter, Frank

arXiv.org Artificial IntelligenceApr-17-2024

While deep learning has celebrated many successes, its results often hinge on the meticulous selection of hyperparameters (HPs). However, the time-consuming nature of deep learning training makes HP optimization (HPO) a costly endeavor, slowing down the development of efficient HPO tools. While zero-cost benchmarks, which provide performance and runtime without actual training, offer a solution for non-parallel setups, they fall short in parallel setups as each worker must communicate its queried runtime to return its evaluation in the exact order. This work addresses this challenge by introducing a user-friendly Python package that facilitates efficient parallel HPO with zero-cost benchmarks. Our approach calculates the exact return order based on the information stored in file system, eliminating the need for long waiting times and enabling much faster HPO evaluations. We first verify the correctness of our approach through extensive testing and the experiments with 6 popular HPO libraries show its applicability to diverse libraries and its ability to achieve over 1000x speedup compared to a traditional approach. Our package can be installed via pip install mfhpo-simulator.

artificial intelligence, machine learning, optimization problem, (21 more...)

arXiv.org Artificial Intelligence

2403.01888

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait

Watanabe, Shuhei

arXiv.org Artificial IntelligenceJun-29-2023

Hyperparameter (HP) optimization of deep learning (DL) is essential for high performance. As DL often requires several hours to days for its training, HP optimization (HPO) of DL is often prohibitively expensive. This boosted the emergence of tabular or surrogate benchmarks, which enable querying the (predictive) performance of DL with a specific HP configuration in a fraction. However, since the actual runtime of a DL training is significantly different from its query response time, simulators of an asynchronous HPO, e.g. multi-fidelity optimization, must wait for the actual runtime at each iteration in a na\"ive implementation; otherwise, the evaluation order during simulation does not match with the real experiment. To ease this issue, we developed a Python wrapper and describe its usage. This wrapper forces each worker to wait so that we yield exactly the same evaluation order as in the real experiment with only $10^{-2}$ seconds of waiting instead of waiting several hours. Our implementation is available at https://github.com/nabenabe0928/mfhpo-simulator/.

artificial intelligence, benchmark, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.17595

Country: Europe > Germany (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator

Watanabe, Shuhei, Awad, Noor, Onishi, Masaki, Hutter, Frank

arXiv.org Artificial IntelligenceMay-31-2023

Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE's acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on "Multiobjective Hyperparameter Optimization for Transformers".

machine learning, natural language, optimization, (17 more...)

arXiv.org Artificial Intelligence

2212.06751

Country:

Europe > Germany (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

Watanabe, Shuhei, Bansal, Archit, Hutter, Frank

arXiv.org Artificial IntelligenceMay-26-2023

The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.10255

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

Watanabe, Shuhei, Hutter, Frank

arXiv.org Artificial IntelligenceMay-26-2023

Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as memory usage, or latency on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on 81 expensive HPO with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D.

artificial intelligence, constraint, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.14411

Country: Europe (0.46)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance

Watanabe, Shuhei

arXiv.org Artificial IntelligenceMay-26-2023

Recent advances in many domains require more and more complicated experiment design. Such complicated experiments often have many parameters, which necessitate parameter tuning. Tree-structured Parzen estimator (TPE), a Bayesian optimization method, is widely used in recent parameter tuning frameworks. Despite its popularity, the roles of each control parameter and the algorithm intuition have not been discussed so far. In this tutorial, we will identify the roles of each control parameter and their impacts on hyperparameter optimization using a diverse set of benchmarks. We compare our recommended setting drawn from the ablation study with baseline methods and demonstrate that our recommended setting improves the performance of TPE.

artificial intelligence, cumulative minimum objective, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2304.11127

Country: Europe (0.28)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.69)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Python Tool for Visualizing Variability of Pareto Fronts over Multiple Runs

Watanabe, Shuhei

arXiv.org Artificial IntelligenceMay-15-2023

Hyperparameter optimization is crucial to achieving high performance in deep learning. On top of the performance, other criteria such as inference time or memory requirement often need to be optimized due to some practical reasons. This motivates research on multi-objective optimization (MOO). However, Pareto fronts of MOO methods are often shown without considering the variability caused by random seeds and this makes the performance stability evaluation difficult. Although there is a concept named empirical attainment surface to enable the visualization with uncertainty over multiple runs, there is no major Python package for empirical attainment surface. We, therefore, develop a Python package for this purpose and describe the usage. The package is available at https://github.com/nabenabe0928/empirical-attainment-func.

artificial intelligence, machine learning, objective, (16 more...)

arXiv.org Artificial Intelligence

2305.08852

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Multiobjective Tree-Structured Parzen Estimator

Ozaki, Yoshihiko | Tanigaki, Yuki (National Institute of Advanced Industrial Science and Technology) | Watanabe, Shuhei (University of Freiburg) | Nomura, Masahiro (CyberAgent, Inc.) | Onishi, Masaki (National Institute of Advanced Industrial Science and Technology)

Journal of Artificial Intelligence ResearchApr-8-2022

Practitioners often encounter challenging real-world problems that involve a simultaneous optimization of multiple objectives in a complex search space. To address these problems, we propose a practical multiobjective Bayesian optimization algorithm. It is an extension of the widely used Tree-structured Parzen Estimator (TPE) algorithm, called Multiobjective Tree-structured Parzen Estimator (MOTPE). We demonstrate that MOTPE approximates the Pareto fronts of a variety of benchmark problems and a convolutional neural network design problem better than existing methods through the numerical results. We also investigate how the configuration of MOTPE affects the behavior and the performance of the method and the effectiveness of asynchronous parallelization of the method based on the empirical results.

Add feedback