Bayesian Cross Validation and WAIC for Predictive Prior Design in Regular Asymptotic Theory

arXiv.org Machine Learning

Prior design is one of the most important problems in both statistics and machine learning. The cross validation (CV) and the widely applicable information criterion (WAIC) are predictive measures of the Bayesian estimation, however, it has been difficult to apply them to find the optimal prior because their mathematical properties in prior evaluation have been unknown and the region of the hyperparameters is too wide to be examined. In this paper, we derive a new formula by which the theoretical relation among CV, WAIC, and the generalization loss is clarified and the optimal hyperparameter can be directly found. By the formula, three facts are clarified about predictive prior design. Firstly, CV and WAIC have the same second order asymptotic expansion, hence they are asymptotically equivalent to each other as the optimizer of the hyperparameter. Secondly, the hyperparameter which minimizes CV or WAIC makes the average generalization loss to be minimized asymptotically but does not the random generalization loss. And lastly, by using the mathematical relation between priors, the variances of the optimized hyperparameters by CV and WAIC are made smaller with small computational costs. Also we show that the optimized hyperparameter by DIC or the marginal likelihood does not minimize the average or random generalization loss in general.


Discriminant Analysis with Adaptively Pooled Covariance

arXiv.org Machine Learning

Linear and Quadratic Discriminant analysis (LDA/QDA) are common tools for classification problems. For these methods we assume observations are normally distributed within group. We estimate a mean and covariance matrix for each group and classify using Bayes theorem. With LDA, we estimate a single, pooled covariance matrix, while for QDA we estimate a separate covariance matrix for each group. Rarely do we believe in a homogeneous covariance structure between groups, but often there is insufficient data to separately estimate covariance matrices. We propose L1- PDA, a regularized model which adaptively pools elements of the precision matrices. Adaptively pooling these matrices decreases the variance of our estimates (as in LDA), without overly biasing them. In this paper, we propose and discuss this method, give an efficient algorithm to fit it for moderate sized problems, and show its efficacy on real and simulated datasets.


Audits as Evidence: Experiments, Ensembles, and Enforcement

arXiv.org Machine Learning

We develop tools for utilizing correspondence experiments to detect illegal discrimination by individual employers. Employers violate US employment law if their propensity to contact applicants depends on protected characteristics such as race or sex. We establish identification of higher moments of the causal effects of protected characteristics on callback rates as a function of the number of fictitious applications sent to each job ad. These moments are used to bound the fraction of jobs that illegally discriminate. Applying our results to three experimental datasets, we find evidence of significant employer heterogeneity in discriminatory behavior, with the standard deviation of gaps in job-specific callback probabilities across protected groups averaging roughly twice the mean gap. In a recent experiment manipulating racially distinctive names, we estimate that at least 85% of jobs that contact both of two white applications and neither of two black applications are engaged in illegal discrimination. To assess the tradeoff between type I and II errors presented by these patterns, we consider the performance of a series of decision rules for investigating suspicious callback behavior under a simple two-type model that rationalizes the experimental data. Though, in our preferred specification, only 17% of employers are estimated to discriminate on the basis of race, we find that an experiment sending 10 applications to each job would enable accurate detection of 7-10% of discriminators while falsely accusing fewer than 0.2% of non-discriminators. A minimax decision rule acknowledging partial identification of the joint distribution of callback rates yields higher error rates but more investigations than our baseline two-type model. Our results suggest illegal labor market discrimination can be reliably monitored with relatively small modifications to existing audit designs.


Real-Time Inference for a Gamma Process Model of Neural Spiking

Neural Information Processing Systems

With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparamet- ric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-the- art. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) de- tecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using mul- tiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain.


WS06-05-006.pdf

AAAI Conferences

This paper describes an effort to measure the effectiveness of tutor help in an intelligent tutoring system. Although conventional pre-and post-test experiments can determine whether tutor help is effective, they are expensive to conduct. Furthermore, pre-and post-test experiments often do not model student knowledge explicitly and thus are ignoring a source of information: students often request help about words they do not know. Therefore, we construct a dynamic Bayes net (which we call the Help model) that models tutor help and student knowledge in one coherent framework. The Help model distinguishes two different effects of help: scaffolding immediate performance vs. teaching persistent knowledge that improves long term performance. We train the Help model to fit student performance data gathered from usage of the Reading Tutor (Mostow & Aist, 2001). The parameters of the trained model suggest that students benefit from both the scaffolding and teaching effects of help. That is, students are more likely to perform correctly on the current attempt and learn persistent knowledge if tutor help is provided. Thus, our framework is able to distinguish two types of influence that tutor help has on the student, and can determine whether help helps learning without an explicit controlled study.