Optimization
Rule Covering for Interpretation and Boosting
Birbil, S. Ilker, Edali, Mert, Yuceoglu, Birol
We propose two algorithms for interpretation and boosting of tree-based ensemble methods. Both algorithms make use of mathematical programming models that are constructed with a set of rules extracted from an ensemble of decision trees. The objective is to obtain the minimum total impurity with the least number of rules that cover all the samples. The first algorithm uses the collection of decision trees obtained from a trained random forest model. Our numerical results show that the proposed rule covering approach selects only a few rules that could be used for interpreting the random forest model. Moreover, the resulting set of rules closely matches the accuracy level of the random forest model. Inspired by the column generation algorithm in linear programming, our second algorithm uses a rule generation scheme for boosting decision trees. We use the dual optimal solutions of the linear programming models as sample weights to obtain only those rules that would improve the accuracy. With a computational study, we observe that our second algorithm performs competitively with the other well-known boosting methods. Our implementations also demonstrate that both algorithms can be trivially coupled with the existing random forest and decision tree packages.
Automated Machine Learning (AutoML) Libraries for Python - AnalyticsWeek
AutoML provides tools to automatically discover good machine learning model pipelines for a dataset with very little user intervention. It is ideal for domain experts new to machine learning or machine learning practitioners looking to get good results quickly for a predictive modeling task. Open-source libraries are available for using AutoML methods with popular machine learning libraries in Python, such as the scikit-learn machine learning library. In this tutorial, you will discover how to use top open-source AutoML libraries for scikit-learn in Python. Automated Machine Learning (AutoML) Libraries for Python Photo by Michael Coghlan, some rights reserved.
Automated Machine Learning (AutoML) Libraries for Python
AutoML provides tools to automatically discover good machine learning model pipelines for a dataset with very little user intervention. It is ideal for domain experts new to machine learning or machine learning practitioners looking to get good results quickly for a predictive modeling task. Open-source libraries are available for using AutoML methods with popular machine learning libraries in Python, such as the scikit-learn machine learning library. In this tutorial, you will discover how to use top open-source AutoML libraries for scikit-learn in Python. Automated Machine Learning (AutoML) Libraries for Python Photo by Michael Coghlan, some rights reserved.
Cross-Entropy Method Variants for Optimization
The cross-entropy (CE) method is a popular stochastic method for optimization due to its simplicity and effectiveness. Designed for rare-event simulations where the probability of a target event occurring is relatively small, the CE-method relies on enough objective function calls to accurately estimate the optimal parameters of the underlying distribution. Certain objective functions may be computationally expensive to evaluate, and the CE-method could potentially get stuck in local minima. This is compounded with the need to have an initial covariance wide enough to cover the design space of interest. We introduce novel variants of the CE-method to address these concerns. To mitigate expensive function calls, during optimization we use every sample to build a surrogate model to approximate the objective function. The surrogate model augments the belief of the objective function with less expensive evaluations. We use a Gaussian process for our surrogate model to incorporate uncertainty in the predictions which is especially helpful when dealing with sparse data. To address local minima convergence, we use Gaussian mixture models to encourage exploration of the design space. We experiment with evaluation scheduling techniques to reallocate true objective function calls earlier in the optimization when the covariance is the largest. To test our approach, we created a parameterized test objective function with many local minima and a single global minimum. Our test function can be adjusted to control the spread and distinction of the minima. Experiments were run to stress the cross-entropy method variants and results indicate that the surrogate model-based approach reduces local minima convergence using the same number of function evaluations.
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
Zheng, Shuai, Lin, Haibin, Zha, Sheng, Li, Mu
BERT has recently attracted a lot of attention in natural language understanding (NLU) and achieved state-of-the-art results in various NLU tasks. However, its success requires large deep neural networks and huge amount of data, which result in long training time and impede development progress. Using stochastic gradient methods with large mini-batch has been advocated as an efficient tool to reduce the training time. Along this line of research, LAMB is a prominent example that reduces the training time of BERT from 3 days to 76 minutes on a TPUv3 Pod. In this paper, we propose an accelerated gradient method called LANS to improve the efficiency of using large mini-batches for training. As the learning rate is theoretically upper bounded by the inverse of the Lipschitz constant of the function, one cannot always reduce the number of optimization iterations by selecting a larger learning rate. In order to use larger mini-batch size without accuracy loss, we develop a new learning rate scheduler that overcomes the difficulty of using large learning rate. Using the proposed LANS method and the learning rate scheme, we scaled up the mini-batch sizes to 96K and 33K in phases 1 and 2 of BERT pretraining, respectively. It takes 54 minutes on 192 AWS EC2 P3dn.24xlarge instances to achieve a target F1 score of 90.5 or higher on SQuAD v1.1, achieving the fastest BERT training time in the cloud.
Learnable Strategies for Bilateral Agent Negotiation over Multiple Issues
Bagga, Pallavi, Paoletti, Nicola, Stathis, Kostas
We present a novel bilateral negotiation model that allows a self-interested agent to learn how to negotiate over multiple issues in the presence of user preference uncertainty. The model relies upon interpretable strategy templates representing the tactics the agent should employ during the negotiation and learns template parameters to maximize the average utility received over multiple negotiations, thus resulting in optimal bid acceptance and generation. Our model also uses deep reinforcement learning to evaluate threshold utility values, for those tactics that require them, thereby deriving optimal utilities for every environment state. To handle user preference uncertainty, the model relies on a stochastic search to find user model that best agrees with a given partial preference profile. Multi-objective optimization and multi-criteria decision-making methods are applied at negotiation time to generate Pareto-optimal outcomes thereby increasing the number of successful (win-win) negotiations. Rigorous experimental evaluations show that the agent employing our model outperforms the winning agents of the 10th Automated Negotiating Agents Competition (ANAC'19) in terms of individual as well as social-welfare utilities.
Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection
Li, Wenhao, Chen, Ningyuan, Hong, L. Jeff
We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariate $\mathbf{x}$ and decision $\mathbf{y}$. The reward function to learn, $f(\mathbf{x},\mathbf{y})$, does not have a particular parametric form. The literature has shown that the optimal regret is $\tilde{O}(T^{(d_x+d_y+1)/(d_x+d_y+2)})$, where $d_x$ and $d_y$ are the dimensions of $\mathbf x$ and $\mathbf y$, and thus it suffers from the curse of dimensionality. In many applications, only a small subset of variables in the covariate affect the value of $f$, which is referred to as \textit{sparsity} in statistics. To take advantage of the sparsity structure of the covariate, we propose a variable selection algorithm called \textit{BV-LASSO}, which incorporates novel ideas such as binning and voting to apply LASSO to nonparametric settings. Our algorithm achieves the regret $\tilde{O}(T^{(d_x^*+d_y+1)/(d_x^*+d_y+2)})$, where $d_x^*$ is the effective covariate dimension. The regret matches the optimal regret when the covariate is $d^*_x$-dimensional and thus cannot be improved. Our algorithm may serve as a general recipe to achieve dimension reduction via variable selection in nonparametric settings.
Multi-objective dynamic programming with limited precision
Mandow, L., de la Cruz, J. L. Pérez, Pozas, N.
Markov decision processes (MDP) are a well-known conceptual tool useful for modelling sequential decision processes and have been widely used in real-world applications such as adaptive production control (see e. g. Kuhnle et al. (2020)), equipment maintenance (see Barde et al. (2019), Liu et al. (2019)) or robot planning (see Veeramani et al. (2020)), to name a few. Usual optimization procedures take into account just a scalar value to be maximized. However, in many cases the objective function is more accurately described by a vector (see e. g. Gen and Lin (2014), Zhang and Xu (2017)) and multi-objective optimization must be applied.
Mean-Variance Analysis in Bayesian Optimization under Uncertainty
Iwazaki, Shogo, Inatsu, Yu, Takeuchi, Ichiro
Decision making in an uncertain environment has been studied in various domains. For example, in financial engineering, the mean-variance analysis [1, 2, 3] has been introduced as a framework for making investment decisions, taking into account the tradeoff between the return (mean) and the risk (variance) of the investment. In this paper we study active learning (AL) in an uncertain environment. In many practical AL problems, there are two types of parameters called design parameters and environmental parameters. For example, in a product design, while the design parameters are fully controllable, the environmental parameters vary depending on the environment in which the product is used. In this paper, we examine AL problems under such an uncertain environment, where the goal is to efficiently find the optimal design parameters by properly taking into account the uncertainty of the environmental parameters. Concretely, let f(x, w) be a blackbox function indicating the performance of a product, where x X is the set of controllable design parameters and w Ω is the set of uncontrollable environmental parameters whose uncertainty is characterized by a probability distribution p(w).
Deep Generative Modeling for Mechanistic-based Learning and Design of Metamaterial Systems
Wang, Liwei, Chan, Yu-Chin, Ahmed, Faez, Liu, Zhao, Zhu, Ping, Chen, Wei
Metamaterials are emerging as a new paradigmatic material system to render unprecedented and tailorable properties for a wide variety of engineering applications. However, the inverse design of metamaterial and its multiscale system is challenging due to high-dimensional topological design space, multiple local optima, and high computational cost. To address these hurdles, we propose a novel data-driven metamaterial design framework based on deep generative modeling. A variational autoencoder (VAE) and a regressor for property prediction are simultaneously trained on a large metamaterial database to map complex microstructures into a low-dimensional, continuous, and organized latent space. We show in this study that the latent space of VAE provides a distance metric to measure shape similarity, enable interpolation between microstructures and encode meaningful patterns of variation in geometries and properties. Based on these insights, systematic data-driven methods are proposed for the design of microstructure, graded family, and multiscale system. For microstructure design, the tuning of mechanical properties and complex manipulations of microstructures are easily achieved by simple vector operations in the latent space. The vector operation is further extended to generate metamaterial families with a controlled gradation of mechanical properties by searching on a constructed graph model. For multiscale metamaterial systems design, a diverse set of microstructures can be rapidly generated using VAE for target properties at different locations and then assembled by an efficient graph-based optimization method to ensure compatibility between adjacent microstructures. We demonstrate our framework by designing both functionally graded and heterogeneous metamaterial systems that achieve desired distortion behaviors.