contextual
Statistical Inference with M-Estimators on Adaptively Collected Data
Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of M-estimators--which includes estimators based on empirical risk minimization as well as maximum likelihood--on data collected with adaptive algorithms, including (contextual) bandit algorithms. Specifically, we show that M-estimators, modified with particular adaptive weights, can be used to construct asymptotically valid confidence regions for a variety of inferential targets.
Appendices619
AAdditional Experiments620 Task 1 - Grouping In addition to grouping clue words using token embeddings (discussed in621 the main paper 4), we also ran grouping the words by clustering on'contextual' embeddings. We622 experimentally induce'context' by joining the sixteen (16) word tokens (in a random order) into a623 single pseudo-sentence. The embeddings for each token were different based on the ordering of the624 tokens. We repeat the random ordering sixteen times and report the mean and variance of the results625 obtained in Table 6.626 Mean standard deviation over 16 random seeds is shown. Task 2 - Connections In addition to prompting based results on GPT-4 (discussed in 4), we ran627 experiments on additional LLMs like LLaMa [67] (7B, 13B) using pre-trained configuration weights628 obtained by permission from Meta AI. However, without additional fine-tuning on the specific task,629 these LLMs were unable to solve the task in a meaningful manner.
Unified Precision-Guaranteed Stopping Rules for Contextual Learning
Ding, Mingrui, Zhao, Qiuhong, Gao, Siyang, Dong, Jing
Contextual learning seeks to learn a decision policy that maps an individual's characteristics to an action through data collection. In operations management, such data may come from various sources, and a central question is when data collection can stop while still guaranteeing that the learned policy is sufficiently accurate. We study this question under two precision criteria: a context-wise criterion and an aggregate policy-value criterion. We develop unified stopping rules for contextual learning with unknown sampling variances in both unstructured and structured linear settings. Our approach is based on generalized likelihood ratio (GLR) statistics for pairwise action comparisons. To calibrate the corresponding sequential boundaries, we derive new time-uniform deviation inequalities that directly control the self-normalized GLR evidence and thus avoid the conservativeness caused by decoupling mean and variance uncertainty. Under the Gaussian sampling model, we establish finite-sample precision guarantees for both criteria. Numerical experiments on synthetic instances and two case studies demonstrate that the proposed stopping rules achieve the target precision with substantially fewer samples than benchmark methods. The proposed framework provides a practical way to determine when enough information has been collected in personalized decision problems. It applies across multiple data-collection environments, including historical datasets, simulation models, and real systems, enabling practitioners to reduce unnecessary sampling while maintaining a desired level of decision quality.
LocallyDifferentiallyPrivate (Contextual)Bandits Learning
Further, we extend our(ε,δ)-LDP algorithm toGeneralized Linear Bandits,which enjoysa sub-linear regret O(T3/4/ε) and is conjectured to be nearly optimal. Note that given the existingΩ(T) lower bound for DP contextual linear bandits [35], our result shows afundamental difference between LDP and DP contextual bandits learning.
Appendices 619 A Additional Experiments 620
Table 6: Results of selected models on Task 1 (Grouping) using contextual embeddings. In this section, we provide additional t-SNE projections of embeddings from various methods used. Figure 7: Solved wall for Task 1 (Grouping) using GloV e. Left: ( " Suspension" is " a term used in musical harmony " in this context. Grief " in the embedding space, which matches the " Good ___! " connection. Figure 8: Solved wall for Task 1 (Grouping) using FastText (Crawl). Left: contextual embedding solved 3/4 groups. Here the clue " Rambrandt" is placed near other Dutch painters. Right: static embedding solved 0/4 groups. The following section provides answers to questions listed in datasheets for datasets. For what purpose was the dataset created? Was there a specific task in mind? Who created this dataset (e.g., which team, research group) and on behalf of which entity (e.g., The dataset has been collectively curated by the authors of this paper. What support was needed to make this dataset?