oth
Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards Appendix AFormal Definition of Inhomogeneous Poisson Process
The inhomogeneous Poisson (point) process is a Poisson point process with a Poisson parameter set as some time-dependent function r(τ). Let N(a,b) represent the number of points of inhomogeneous Poisson process with intensity function r(t) occurring in the interval [a,b], then the probability of n points existing in the interval [a,b] is given by, P(N(a,b) = n) Λ(a,b)n n! In this paper, the points mean the conversions and the time-dependent intensity function r() is defined in Eq. (2) and it depends on the realization of the conversions and parameter θ. Suppose X1, Xn are independent, mean-zero, subexponential random variables, and a = (a1,,an) is an ndimensional constanst vector. We first introduce the main idea of the the PAMM algorithm.
Different thresholding methods on Nearest Shrunken Centroid algorithm
Sahtout, Mohammad Omar, Wang, Haiyan, Ghimire, Santosh
This article considers the impact of different thresholding methods to the Nearest Shrunken Centroid algorithm, which is popularly referred as the Prediction Analysis of Microarrays (PAM) for high-dimensional classification. PAM uses soft thresholding to achieve high computational efficiency and high classification accuracy but in the price of retaining too many features. When applied to microarray human cancers, PAM selected 2611 features on average from 10 multi-class datasets. Such a large number of features make it difficult to perform follow up study. One reason behind this problem is the soft thresholding, which is known to produce biased parameter estimate in regression analysis. In this article, we extend the PAM algorithm with two other thresholding methods, hard and order thresholding, and a deep search algorithm to achieve better thresholding parameter estimate. The modified algorithms are extensively tested and compared to the original one based on real data and Monte Carlo studies. In general, the modification not only gave better cancer status prediction accuracy, but also resulted in more parsimonious models with significantly smaller number of features.
Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance
Miller, Andrew C., Gatys, Leon A., Futoma, Joseph, Fox, Emily B.
Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions $-$ are evaluated with a variety of performance metrics. An important first step in assessing the practical utility of a model is to evaluate its average performance over an entire population of interest. In many settings, it is also critical that the model makes good predictions within predefined subpopulations. For instance, showing that a model is fair or equitable requires evaluating the model's performance in different demographic subgroups. However, subpopulation performance metrics are typically computed using only data from that subgroup, resulting in higher variance estimates for smaller groups. We devise a procedure to measure subpopulation performance that can be more sample-efficient than the typical subsample estimates. We propose using an evaluation model $-$ a model that describes the conditional distribution of the predictive model score $-$ to form model-based metric (MBM) estimates. Our procedure incorporates model checking and validation, and we propose a computationally efficient approximation of the traditional nonparametric bootstrap to form confidence intervals. We evaluate MBMs on two main tasks: a semi-synthetic setting where ground truth metrics are available and a real-world hospital readmission prediction task. We find that MBMs consistently produce more accurate and lower variance estimates of model performance for small subpopulations.