paq
Adaptive Budget Allocation in LLM-Augmented Surveys
Ye, Zikun, Lyu, Jiameng, Tao, Rui
Large language models (LLMs) can generate survey responses at low cost, but their reliability varies substantially across questions and is unknown before data collection. Deploying LLMs in surveys still requires costly human responses for verification and correction. How should a limited human-labeling budget be allocated across questions in real time? We propose an adaptive allocation algorithm that learns which questions are hardest for the LLM while simultaneously collecting human responses. Each human label serves a dual role: it improves the estimate for that question and reveals how well the LLM predicts human responses on it. The algorithm directs more budget to questions where the LLM is least reliable, without requiring any prior knowledge of question-level LLM accuracy. We prove that the allocation gap relative to the best possible allocation vanishes as the budget grows, and validate the approach on both synthetic data and a real survey dataset with 68 questions and over 2000 respondents. On real survey data, the standard practice of allocating human labels uniformly across questions wastes 10--12% of the budget relative to the optimal; our algorithm reduces this waste to 2--6%, and the advantage grows as questions become more heterogeneous in LLM prediction quality. The algorithm achieves the same estimation quality as traditional uniform sampling with fewer human samples, requires no pilot study, and is backed by formal performance guarantees validated on real survey data. More broadly, the framework applies whenever scarce human oversight must be allocated across tasks where LLM reliability is unknown.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Questionnaire & Opinion Survey (1.00)
- Research Report (0.81)
- Information Technology (0.45)
- Education (0.45)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query (PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobis distance. This gives rise to a high-dimensional, low-rank matrix estimation problem to which standard matrix estimators cannot be applied. Consequently, we develop a two-stage estimator for metric learning from PAQs, and provide sample complexity guarantees for this estimator.
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology (0.45)
- Education (0.45)
- North America > United States (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reinforcement Learning with Action-Triggered Observations
Ryabchenko, Alexander, Mou, Wenlong
We study reinforcement learning problems where state observations are stochastically triggered by actions, a constraint common in many real-world applications. This framework is formulated as Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), where each action has a specified probability of triggering a state observation. We derive tailored Bellman optimality equations for this framework and introduce the action-sequence learning paradigm in which agents commit to executing a sequence of actions until the next observation arrives. Under the linear MDP assumption, value-functions are shown to admit linear representations in an induced action-sequence feature map. Leveraging this structure, we propose off-policy estimators with statistical error guarantees for such feature maps and introduce ST-LSVI-UCB, a variant of LSVI-UCB adapted for action-triggered settings. ST-LSVI-UCB achieves regret $\widetilde O(\sqrt{Kd^3(1-γ)^{-3}})$, where $K$ is the number of episodes, $d$ the feature dimension, and $γ$ the discount factor (per-step episode non-termination probability). Crucially, this work establishes the theoretical foundation for learning with sporadic, action-triggered observations while demonstrating that efficient learning remains feasible under such observation constraints.
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- (2 more...)
- Research Report (0.81)
- Workflow (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
- North America > United States (0.28)
- Europe (0.28)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.71)
- Energy > Oil & Gas > Upstream (0.46)
- North America > United States > Michigan (0.14)
- Europe > France (0.14)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.51)
- Energy > Oil & Gas > Upstream (0.46)