Goto

Collaborating Authors

 hyperparameter range



Foraprobabilityspace (โ„ฆBox,E,PBox),withโ„ฆBox Rd,theGaussian-boxprocessisgeneratedas ยตi โ„ฆBox, ฯƒi Rd+, r Rd+ Ci N(ยตi,ฯƒi), Xi, =Ci+ri, Xi, =Ci ri, Box(Xi) = dY

Neural Information Processing Systems

All coordinates will be modeled by independent Gumbel distributions, and thus it is enough to calculate the expected side-length of a box as the expected volume will simply be the product of the expected side-lengths. To properly restrict the Gumbel distributions to[0,1], we can either formcensoredortruncated distributions. Thetruncateddistribution,ontheotherhand,multipliesthe densities with the indicator function for[0,1]and renormalizes them to integrate to 1. The higher the temperature of the boxes, the more the true integral will tend to provide larger conditional probabilities. Monte Carlo experiments support this conclusion.



Reviews: Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning

Neural Information Processing Systems

My concern about generalization still remains, and I hope the authors can devote maybe a sentence or two to it in the final draft - even something to the effect of "it is a concern; experimental evidence suggests it is not a great concern."] Summary: For any given ML algorithm, e.g., random forests, the paper proposes a transfer-learning approach for selection of hyperparameters (limited to those parameters that can be ordered) wherein a bounding space is constructed from previous evaluations of that algorithm on other datasets. Two types of bounding spaces are described. The box space is the tightest bounding box covering the best known hyperparameter settings for previous datasets. The ellipsoid is found as the smallest-volume ellipsoid covering the best known settings (via convex optimization).


Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer

arXiv.org Artificial Intelligence

A reliable and efficient representation of multivariate time series is crucial in various downstream machine learning tasks. In multivariate time series forecasting, each variable depends on its historical values and there are inter-dependencies among variables as well. Models have to be designed to capture both intra- and inter-relationships among the time series. To move towards this goal, we propose the Time Series Attention Transformer (TSAT) for multivariate time series representation learning. Using TSAT, we represent both temporal information and inter-dependencies of multivariate time series in terms of edge-enhanced dynamic graphs. The intra-series correlations are represented by nodes in a dynamic graph; a self-attention mechanism is modified to capture the inter-series correlations by using the super-empirical mode decomposition (SMD) module. We applied the embedded dynamic graphs to times series forecasting problems, including two real-world datasets and two benchmark datasets. Extensive experiments show that TSAT clearly outerperforms six state-of-the-art baseline methods in various forecasting horizons. We further visualize the embedded dynamic graphs to illustrate the graph representation power of TSAT. We share our code at https://github.com/RadiantResearch/TSAT.


Sherpa: Robust Hyperparameter Optimization for Machine Learning

arXiv.org Machine Learning

Sherpa is a hyperparameter optimization library for machine learning models. It is specifically designed for problems with computationally expensive, iterative function evaluations, such as the hyperparameter tuning of deep neural networks. With Sherpa, scientists can quickly optimize hyperparameters using a variety of powerful and interchangeable algorithms. Sherpa can be run on either a single machine or in parallel on a cluster. Finally, an interactive dashboard enables users to view the progress of models as they are trained, cancel trials, and explore which hyperparameter combinations are working best. Sherpa empowers machine learning practitioners by automating the more tedious aspects of model tuning. Its source code and documentation are available at https://github.com/sherpa-ai/sherpa.


Causality and Bayesian network PDEs for multiscale representations of porous media

arXiv.org Machine Learning

Microscopic (pore-scale) properties of porous media affect and often determine their macroscopic (continuum- or Darcy-scale) counterparts. Understanding the relationship between processes on these two scales is essential to both the derivation of macroscopic models of, e.g., transport phenomena in natural porous media, and the design of novel materials, e.g., for energy storage. Most microscopic properties exhibit complex statistical correlations and geometric constraints, which presents challenges for the estimation of macroscopic quantities of interest (QoIs), e.g., in the context of global sensitivity analysis (GSA) of macroscopic QoIs with respect to microscopic material properties. We present a systematic way of building correlations into stochastic multiscale models through Bayesian networks. This allows us to construct the joint probability density function (PDF) of model parameters through causal relationships that emulate engineering processes, e.g., the design of hierarchical nanoporous materials. Such PDFs also serve as input for the forward propagation of parametric uncertainty; our findings indicate that the inclusion of causal relationships impacts predictions of macroscopic QoIs. To assess the impact of correlations and causal relationships between microscopic parameters on macroscopic material properties, we use a moment-independent GSA based on the differential mutual information. Our GSA accounts for the correlated inputs and complex non-Gaussian QoIs. The global sensitivity indices are used to rank the effect of uncertainty in microscopic parameters on macroscopic QoIs, to quantify the impact of causality on the multiscale model's predictions, and to provide physical interpretations of these results for hierarchical nanoporous materials.


Hyperparameter Tuning the Random Forest in Python โ€“ Towards Data Science

#artificialintelligence

I have included Python code in this article where it is most instructive. Full code and data to follow along can be found on the project Github page. The best way to think about hyperparameters is like the settings of an algorithm that can be adjusted to optimize performance, just as we might turn the knobs of an AM radio to get a clear signal (or your parents might have!). While model parameters are learned during training -- such as the slope and intercept in a linear regression -- hyperparameters must be set by the data scientist before training. In the case of a random forest, hyperparameters include the number of decision trees in the forest and the number of features considered by each tree when splitting a node.