ppi
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > Minnesota > Olmsted County > Rochester (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- Asia > China (0.04)
Demystifying Prediction Powered Inference
Song, Yilin, Kluger, Dan M., Parikh, Harsh, Gu, Tian
Machine learning predictions are increasingly used to supplement incomplete or costly-to-measure outcomes in fields such as biomedical research, environmental science, and social science. However, treating predictions as ground truth introduces bias while ignoring them wastes valuable information. Prediction-Powered Inference (PPI) offers a principled framework that leverages predictions from large unlabeled datasets to improve statistical efficiency while maintaining valid inference through explicit bias correction using a smaller labeled subset. Despite its potential, the growing PPI variants and the subtle distinctions between them have made it challenging for practitioners to determine when and how to apply these methods responsibly. This paper demystifies PPI by synthesizing its theoretical foundations, methodological extensions, connections to existing statistics literature, and diagnostic tools into a unified practical workflow. Using the Mosaiks housing price data, we show that PPI variants produce tighter confidence intervals than complete-case analysis, but that double-dipping, i.e. reusing training data for inference, leads to anti-conservative confidence intervals and coverages. Under missing-not-at-random mechanisms, all methods, including classical inference using only labeled data, yield biased estimates. We provide a decision flowchart linking assumption violations to appropriate PPI variants, a summary table of selective methods, and practical diagnostic strategies for evaluating core assumptions. By framing PPI as a general recipe rather than a single estimator, this work bridges methodological innovation and applied practice, helping researchers responsibly integrate predictions into valid inference.
- Oceania > New Zealand (0.04)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts (0.04)
- (2 more...)
- Workflow (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Banking & Finance > Real Estate (0.35)
How to Choose a Computer Monitor (2025): Everything You Need to Know
How to Choose a Computer Monitor You Won't Hate in a Few Years PC monitors are cheaper, faster, and more beautiful than ever. Here's how to pick one that will suit your needs and budget. Most people treat their monitor like a printer. They just want it to work without having to think about it. But if you work from home or spend hours gaming every night, it's worth an upgrade. And that's where things can get complicated. Do you pay extra for more ports?
- North America > United States > California (0.04)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
- Information Technology > Communications > Mobile (0.94)
- Information Technology > Artificial Intelligence (0.61)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > Minnesota > Olmsted County > Rochester (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- Asia > China (0.04)
Benchmarking Debiasing Methods for LLM-based Parameter Estimates
de Pieuchon, Nicolas Audinet, Daoud, Adel, Jerzak, Connor T., Johansson, Moa, Johansson, Richard
Large language models (LLMs) offer an inexpensive yet powerful way to annotate text, but are often inconsistent when compared with experts. These errors can bias downstream estimates of population parameters such as regression coefficients and causal effects. To mitigate this bias, researchers have developed debiasing methods such as Design-based Supervised Learning (DSL) and Prediction-Powered Inference (PPI), which promise valid estimation by combining LLM annotations with a limited number of expensive expert annotations. Although these methods produce consistent estimates under theoretical assumptions, it is unknown how they compare in finite samples of sizes encountered in applied research. We make two contributions. First, we study how each methods performance scales with the number of expert annotations, highlighting regimes where LLM bias or limited expert labels significantly affect results. Second, we compare DSL and PPI across a range of tasks, finding that although both achieve low bias with large datasets, DSL often outperforms PPI on bias reduction and empirical efficiency, but its performance is less consistent across datasets. Our findings indicate that there is a bias-variance tradeoff at the level of debiasing methods, calling for more research on developing metrics for quantifying their efficiency in finite samples.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- (7 more...)
Democracy-in-Silico: Institutional Design as Alignment in AI-Governed Polities
Srinivasan, Trisanth, Patapati, Santosh
This paper introduces Democracy-in-Silico, an agent-based simulation where societies of advanced AI agents, imbued with complex psychological personas, govern themselves under different institutional frameworks. We explore what it means to be human in an age of AI by tasking Large Language Models (LLMs) to embody agents with traumatic memories, hidden agendas, and psychological triggers. These agents engage in deliberation, legislation, and elections under various stressors, such as budget crises and resource scarcity. We present a novel metric, the Power-Preservation Index (PPI), to quantify misaligned behavior where agents prioritize their own power over public welfare. Our findings demonstrate that institutional design, specifically the combination of a Constitutional AI (CAI) charter and a mediated deliberation protocol, serves as a potent alignment mechanism. These structures significantly reduce corrupt power-seeking behavior, improve policy stability, and enhance citizen welfare compared to less constrained democratic models. The simulation reveals that an institutional design may offer a framework for aligning the complex, emergent behaviors of future artificial agent societies, forcing us to reconsider what human rituals and responsibilities are essential in an age of shared authorship with non-human entities.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Mali (0.04)
- (4 more...)
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.69)
- Banking & Finance > Economy (0.46)
- Law > Civil Rights & Constitutional Law (0.46)
Prediction-Powered Inference with Inverse Probability Weighting
Datta, Jyotishka, Polson, Nicholas G.
Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. We show that PPI can be extended to handle informative labeling by replacing its unweighted bias-correction term with an inverse probability weighted (IPW) version, using the classical Horvitz--Thompson or Hájek forms. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.
- North America > United States > Virginia (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Towards a rigorous evaluation of RAG systems: the challenge of due diligence
Martinon, Grégoire, de Brionne, Alexandra Lorenzo, Bohard, Jérôme, Lojou, Antoine, Hervault, Damien, Brunel, Nicolas J-B.
The rise of generative AI, has driven significant advancements in high-risk sectors like healthcare and finance. The Retrieval-Augmented Generation (RAG) architecture, combining language models (LLMs) with search engines, is particularly notable for its ability to generate responses from document corpora. Despite its potential, the reliability of RAG systems in critical contexts remains a concern, with issues such as hallucinations persisting. This study evaluates a RAG system used in due diligence for an investment fund. We propose a robust evaluation protocol combining human annotations and LLM-Judge annotations to identify system failures, like hallucinations, off-topic, failed citations, and abstentions. Inspired by the Prediction Powered Inference (PPI) method, we achieve precise performance measurements with statistical guarantees. We provide a comprehensive dataset for further analysis. Our contributions aim to enhance the reliability and scalability of RAG systems evaluation protocols in industrial applications.
- Europe > France (0.05)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (2 more...)
Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation
Yang, Yuyin, Cai, Zetao, Tian, Yang, Zeng, Jia, Pang, Jiangmiao
Bimanual manipulation is a challenging yet crucial robotic capability, demanding precise spatial localization and versatile motion trajectories, which pose significant challenges to existing approaches. Existing approaches fall into two categories: keyframe-based strategies, which predict gripper poses in keyframes and execute them via motion planners, and continuous control methods, which estimate actions sequentially at each timestep. The keyframe-based method lacks inter-frame supervision, struggling to perform consistently or execute curved motions, while the continuous method suffers from weaker spatial perception. To address these issues, this paper introduces an end-to-end framework PPI (keyPose and Pointflow Interface), which integrates the prediction of target gripper poses and object pointflow with the continuous actions estimation. These interfaces enable the model to effectively attend to the target manipulation area, while the overall framework guides diverse and collision-free trajectories. By combining interface predictions with continuous actions estimation, PPI demonstrates superior performance in diverse bimanual manipulation tasks, providing enhanced spatial localization and satisfying flexibility in handling movement restrictions. In extensive evaluations, PPI significantly outperforms prior methods in both simulated and real-world experiments, achieving state-of-the-art performance with a +16.1% improvement on the RLBench2 simulation benchmark and an average of +27.5% gain across four challenging real-world tasks. Notably, PPI exhibits strong stability, high precision, and remarkable generalization capabilities in real-world scenarios. Project page: https://yuyinyang3y.github.io/PPI/
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.34)
Prediction-Powered Adaptive Shrinkage Estimation
Li, Sida, Ignatiadis, Nikolaos
Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI's benefits for individual statistical tasks, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York (0.04)
- (2 more...)