Measurement error adds noise to predictions, increases uncertainty in parameter estimates, and makes it more difficult to discover new phenomena or to distinguish among competing theories. A common view is that any study finding an effect under noisy conditions provides evidence that the underlying effect is particularly strong and robust. Yet, statistical significance conveys very little information when measurements are noisy. In noisy research settings, poor measurement can contribute to exaggerated estimates of effect size. This problem and related misunderstandings are key components in a feedback loop that perpetuates the replication crisis in science.
Fair inference in supervised learning is an important and active area of research, yielding a range of useful methods to assess and account for fairness criteria when predicting ground truth targets. As shown in recent work, however, when target labels are error-prone, potential prediction unfairness can arise from measurement error. In this paper, we show that, when an error-prone proxy target is used, existing methods to assess and calibrate fairness criteria do not extend to the true target variable of interest. To remedy this problem, we suggest a framework resulting from the combination of two existing literatures: fair ML methods, such as those found in the counterfactual fairness literature on the one hand, and, on the other, measurement models found in the statistical literature. We discuss these approaches and their connection resulting in our framework. In a healthcare decision problem, we find that using a latent variable model to account for measurement error removes the unfairness detected previously.
Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Dynamic, smart people and inspiring, innovative technologies are the norm here. The people who work here have reinvented entire industries with all Apple Hardware products.
But it's well on the way to take the place of rocket science as the go-to metaphor for unintelligible math. Quantum mechanics, you have certainly heard, is infamously difficult to understand. Popular science accounts inevitably refer to it as "strange," "weird," "mind-boggling," or all of the above. Quantum mechanics is perfectly comprehensible. It's just that physicists abandoned the only way to make sense of it half a century ago. Fast forward to today and progress in the foundations of physics has all but stalled. The big questions that were open then are still open today.
Many of the commenters have interesting things to say, and I recommend you read the entire discussion. The one point that I think many of the discussants are missing, though, is the importance of design and measurement. For example, Benjamin et al. write, "Compared to using the old 0.05 threshold, maintaining the same level of statistical power requires increasing sample sizes by about 70%." Larger sample size might enable researchers to more easily reach those otherwise elusive low p-values but I don't see this increasing our reproducible scientific knowledge. Along those likes, Kiley Hamlin recommends going straight to full replications, which would have the advantage of giving researchers a predictive target to aim at. I like the idea of replication, rather than p-values, being a goal.