Testing Properties of Multiple Distributions with Few Samples

Aliakbarpour, Maryam, Silwal, Sandeep

arXiv.org Machine Learning 

Statistical tests are a crucial tool in scientific endeavors to analyze data: We routinely model data to be a set of samples from an unknown distribution, and use statist ical tests to infer or verify the properties of the underlying distribution. While these tests typically oper ate under the assumption that data points are drawn from a single underlying distribution, in applications, usually the dat a is gathered from multiple sources. Furthermore in many situations, it is the case that the datas et contains only a few data points from each source. For example, an online shop may have the purchase his tory of thousands of customers while each customer may shop at the store a small number of times. Altern atively, a medical dataset might record the lifestyle behaviors of patients of a particular disease whi le only having few data points from any specific demographic (such as age). On the other hand, data that comes from multiple sources may r esult in a dataset consisting of a collection of unconnected and unrelated data points.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found