In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and high-dimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present a method that uses trajectory optimization as a powerful exploration strategy that guides the policy search. A variational decomposition of a maximum likelihood policy objective allows us to use standard trajectory optimization algorithms such as differential dynamic programming, interleaved with standard supervised learning for the policy itself. We demonstrate that the resulting algorithm can outperform prior methods on two challenging locomotion tasks.
Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location.
Machine Learning (ML) development is an iterative process in which the accuracy of predictions made by the models is continuously improved by repeating the training and evaluation phases. In each of these iterations, certain parameters are tweaked continuously by developers. Any parameter manually selected based on learning from previous experiments qualify to be called a model hyper-parameter. These parameters represent intuitive decisions whose value cannot be estimated from data or from ML theory. The hyper-parameters are knobs that you tweak during each iteration of training a model to improve the accuracy in the predictions made by the model.
In this paper, an evolutionary many-objective optimization algorithm based on corner solution search (MaOEA-CS) was proposed. MaOEA-CS implicitly contains two phases: the exploitative search for the most important boundary optimal solutions - corner solutions, at the first phase, and the use of angle-based selection  with the explorative search for the extension of PF approximation at the second phase. Due to its high efficiency and robustness to the shapes of PFs, it has won the CEC'2017 Competition on Evolutionary Many-Objective Optimization. In addition, MaOEA-CS has also been applied on two real-world engineering optimization problems with very irregular PFs. The experimental results show that MaOEA-CS outperforms other six state-of-the-art compared algorithms, which indicates it has the ability to handle real-world complex optimization problems with irregular PFs.
It is already reported in the literature that the performance of a machine learning algorithm is greatly impacted by performing proper Hyper-Parameter optimization. One of the ways to perform Hyper-Parameter optimization is by manual search but that is time consuming. Some of the common approaches for performing Hyper-Parameter optimization are Grid search Random search and Bayesian optimization using Hyperopt. In this paper, we propose a brand new approach for hyperparameter improvement i.e. Randomized-Hyperopt and then tune the hyperparameters of the XGBoost i.e. the Extreme Gradient Boosting algorithm on ten datasets by applying Random search, Randomized-Hyperopt, Hyperopt and Grid Search. The performances of each of these four techniques were compared by taking both the prediction accuracy and the execution time into consideration. We find that the Randomized-Hyperopt performs better than the other three conventional methods for hyper-paramter optimization of XGBoost.