Optimal Representative Sample Weighting
Barratt, Shane, Angeris, Guillermo, Boyd, Stephen
We consider a setting where we have a set of data samples that were not uniformly sampled from a population, or where they were sampled from a different population than the one from which we wish to draw some conclusions. A common approach is to assign weights to the samples, so the resulting weighted distribution is representative of the population we wish to study. Here representative means that with the weights, certain expected values or probabilities match or are close to known values for the population we wish to study. A a very simple example, consider a data set where each sample is associated with a person. Our data set is 70% female, whereas we'd like to draw conclusions about a population that is 50% female. A simple solution is to down-weight the female samples, and up-weight the male samples in our data set, so the weighted fraction of females is 50%. As a more sophisticated example, suppose we have multiple groups, for example various combinations of sex, age group, income level, and education, and our goal is to find weights for our samples so the fractions of all these groups matches or approximates known fractions in the population we wish to study. In this case, there will be many possible assignments of weights that match the given fractions, and we need to choose a reasonable one. One approach is to maximize the entropy of the weights, subject to matching the given fractions.
May-18-2020
- Country:
- North America
- Puerto Rico (0.04)
- United States
- New York (0.04)
- New Mexico (0.04)
- District of Columbia > Washington (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- North America
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine (1.00)
- Technology: