In the recent years more and more high-dimensional data sets, where the number of parameters $p$ is high compared to the number of observations $n$ or even larger, are available for applied researchers. Boosting algorithms represent one of the major advances in machine learning and statistics in recent years and are suitable for the analysis of such data sets. While Lasso has been applied very successfully for high-dimensional data sets in Economics, boosting has been underutilized in this field, although it has been proven very powerful in fields like Biostatistics and Pattern Recognition. We attribute this to missing theoretical results for boosting. The goal of this paper is to fill this gap and show that boosting is a competitive method for inference of a treatment effect or instrumental variable (IV) estimation in a high-dimensional setting. First, we present the $L_2$Boosting with componentwise least squares algorithm and variants which are tailored for regression problems which are the workhorse for most Econometric problems. Then we show how $L_2$Boosting can be used for estimation of treatment effects and IV estimation. We highlight the methods and illustrate them with simulations and empirical examples. For further results and technical details we refer to Luo and Spindler (2016, 2017) and to the online supplement of the paper.
Q-value estimates for 1-step Q-learning) are maintained and dynamically updated as information comes to hand during the learning process. Excessive variance of these estimators can be problematic, resulting in uneven or unstable learning, or even making effective learning impossible. Estimator variance is usually managed only indirectly, by selecting global learning algorithm parameters (e.g. A for TD(A) based methods) that axe a compromise between an acceptable level of estimator perturbation and other desirable system attributes, such as reduced estimator bias. In this paper, we argue that this approach may not always be adequate, particularly for noisy and non-Markovian domains, and present a direct approach to managing estimator variance, the new ccBeta algorithm. Empirical results in an autonomous robotics domain are also presented showing improved performance using the ccBeta method.
As a robust nonlinear similarity measure in kernel space, correntropy has received increasing attention in domains of machine learning and signal processing. In particular, the maximum correntropy criterion (MCC) has recently been successfully applied in robust regression and filtering. The default kernel function in correntropy is the Gaussian kernel, which is, of course, not always the best choice. In this work, we propose a generalized correntropy that adopts the generalized Gaussian density (GGD) function as the kernel (not necessarily a Mercer kernel), and present some important properties. We further propose the generalized maximum correntropy criterion (GMCC), and apply it to adaptive filtering. An adaptive algorithm, called the GMCC algorithm, is derived, and the mean square convergence performance is studied. We show that the proposed algorithm is very stable and can achieve zero probability of divergence (POD). Simulation results confirm the theoretical expectations and demonstrate the desirable performance of the new algorithm.
Scientists may finally be a step closer to understanding how massive black holes first sprung into existence in the early universe. While the light from these distant black holes is intense enough to reach telescopes from more than 13 billion light-years away, just how they formed is still a mystery. Using data from the 70-terabyte Renaissance Simulation suite on the Blue Waters supercomputer, researchers have found that massive black holes can form in fast-growing regions that are devoid of stars. The simulation shows young galaxies that generate radiation (white) and metals (green) while heating the surrounding gas. 'In this study, we have uncovered a totally new mechanism that sparks the formation of massive black holes in particular dark matter halos,' said John Wise, an associate professor in the Center for Relativistic Astrophysics at Georgia Tech.
The Estimation of Distribution Algorithm is a new class of population based search methods in that a probabilistic model of individuals is estimated based on the high quality individuals and used to generate the new individuals. In this paper we compute 1) some upper bounds on the number of iterations required for global convergence of EDA 2) the exact number of iterations needed for EDA to converge to global optima.