Goto

Collaborating Authors

 estimation error


Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards

Neural Information Processing Systems

Online advertising platforms use automated auctions to connect advertisers with potential customers, requiring effective bidding strategies to maximize profits. Accurate ad impact estimation requires considering three key factors: delayed and long-term effects, cumulative ad impacts such as reinforcement or fatigue, and customer heterogeneity. However, these effects are often not jointly addressed in previous studies. To capture these factors, we model ad bidding as a Contextual Markov Decision Process (CMDP) with delayed Poisson rewards. For efficient estimation, we propose a two-stage maximum likelihood estimator combined with data-splitting strategies, ensuring controlled estimation error based on the first-stage estimator's (in)accuracy. Building on this, we design a reinforcement learning algorithm to derive efficient personalized bidding strategies. This approach achieves a near-optimal regret bound of O(dH2 T), where d is the contextual dimension, H is the number of rounds, and T is the number of customers. Our theoretical findings are validated by simulation experiments.


Bayes optimal learning of attention-indexed models

Neural Information Processing Systems

We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in self-attention layers, that are key components of modern architectures.


AClosed-Form Solution for Fast and Reliable Adaptive Testing

Neural Information Processing Systems

Human ability estimation is essential for educational assessment, career advancement, and professional certification. Adaptive Testing systems can improve estimation efficiency by selecting fewer, targeted questions, and are widely used in exams, e.g., GRE, GMAT, and Duolingo English Test. However, selecting an optimal subset of questions remains a challenging nested optimization problem. Existing methods rely on costly approximations or data-intensive training, making them unsuitable for today's large-scale and complex testing environments. Thus, we propose a Closed-Form solution for question subset selection in Adaptive Testing. It directly minimizes ability estimation error by reducing ability parameter's gradient bias while maintaining Hessian stability, which enables a simple greedy algorithm for question selection. Moreover, it can quantify the impact of human behavioral perturbations on ability estimation. Extensive experiments on large-scale educational datasets demonstrate that it reduces the number of required questions by 10% compared to SOTA methods, while maintaining the same estimation accuracy.


ATheoretical Study on Bridging Internal Probability and Self-Consistency for LLMReasoning

Neural Information Processing Systems

Test-time scaling seeks to improve the reasoning performance of large language models (LLMs) by adding computational resources. A prevalent approach within the field is sampling-based test-time scaling methods, which enhance reasoning by generating multiple reasoning paths for a given input during inference. However, despite its practical success, the theoretical foundations remain underexplored. In this paper, we provide the first theoretical framework for analyzing sampling-based test-time scaling methods, grounded in the perspective of confidence estimation. Based on the framework, we analyze two dominant paradigms: self-consistency and perplexity, and reveal key limitations: self-consistency suffers from high estimation error while perplexity exhibits substantial modeling error and possible degradation of the estimation error convergence.


Co-Regularization Enhances Knowledge Transfer in High Dimensions

Neural Information Processing Systems

Most existing transfer learning algorithms for high-dimensional models employ a two-step regularization framework, whose success heavily hinges on the assumption that the pre-trained model closely resembles the target. To relax this assumption, we propose a co-regularization process to directly exploit beneficial knowledge from the source domain for high-dimensional generalized linear models. The proposed method learns the target parameter by constraining the source parameters to be close to the target one, thereby preventing fine-tuning failures caused by significantly deviated pre-trained parameters. Our theoretical analysis demonstrates that the proposed method accommodates a broader range of sources than existing two-step frameworks, thus being more robust to less similar sources. Its effectiveness is validated through extensive empirical studies.


Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits

Neural Information Processing Systems

Neural Probabilistic Circuits (NPCs), a new class of concept bottleneck models, comprise an attribute recognition model and a probabilistic circuit for reasoning. By integrating the outputs from these two modules, NPCs produce compositional and interpretable predictions. While offering enhanced interpretability and high performance on downstream tasks, the neural-network-based attribute recognition model remains a black box. This vulnerability allows adversarial attacks to manipulate attribute predictions by introducing carefully crafted, subtle perturbations to input images, potentially compromising the final predictions. In this paper, we theoretically analyze the adversarial robustness of NPC and demonstrate that it only depends on the robustness of the attribute recognition model and is independent of the robustness of the probabilistic circuit. Moreover, we propose RNPC, the first robust neural probabilistic circuit against adversarial attacks on the recognition module.



Stability and Oracle Inequalities for Optimal Transport Maps between General Distributions

Neural Information Processing Systems

Optimal transport (OT) provides a powerful framework for comparing and transforming probability distributions, with wide applications in generative modeling, AI4Science and statistical inference. However, existing estimation theory typically requires stringent smoothness conditions on the underlying Brenier potentials and assumes bounded distribution supports, limiting practical applicability. In this paper, we introduce a unified theoretical framework for semi-dual OT map estimation that relaxes both of these restrictions. Building on sieved convex conjugate, our framework has two key contributions: (i) a new map stability bounds that holds without any second-order regularity assumptions on the true Brenier potentials, and (ii) an oracle inequality that cleanly decomposes the estimation error into statistical error, sieved bias, and approximation error. Specifically, our approximation error is measured in the L1 norm rather than Sobolev norm in the existing results, aligning more naturally with classical approximation theory. Leveraging these tools, we provide statistical error of semi-dual estimators with mild and verifiable conditions on the true OT map. Moreover, we establish the first theoretical guarantee for deep neural network OT map estimator between general distributions, with Tanh network function class as an example.


Reward-oriented Causal Representation Learning

Neural Information Processing Systems

Causal representation learning (CRL) is the process of disentangling the latent low-dimensional causally-related generating factors underlying high-dimensional observable data. Extensive recent studies have characterized CRL identifiability and perfect recovery of the latent variables and their attendant causal graph. This paper introduces the notion of reward-oriented CRL, the purpose of which is to move away from perfectly learning the latent representation and instead learning it to the extent needed for optimizing a desired downstream task (reward). In reward-oriented CRL, perfectly learning the latent representation can be excessive; instead, it must be learned at the coarsest level sufficient for optimizing the desired task. Reward-oriented CRL is formalized as the optimization of a desired function of the observable data over the space of all possible interventions and focuses on linear causal and transformation models. To sequentially identify the optimal subset of interventions, an adaptive exploration algorithm is designed that learns the latent causal graph and the variables needed to identify the best intervention. It is shown that for an n-dimensional latent space and a d-dimensional observation space, over a horizon T the algorithm's regret scales as O(d


Transfer Faster, Price Smarter: Minimax Dynamic Pricing under Cross-Market Preference Shift

Neural Information Processing Systems

We study contextual dynamic pricing when a target market can leverage Kauxiliary markets--offline logs or concurrent streams--whose mean utilities differ by a structured preference shift. We propose Cross-Market Transfer Dynamic Pricing (CM-TDP), the first algorithm that provably handles such model-shift transfer and delivers minimax-optimal regret for both linear and nonparametric utility models. For linear utilities of dimension d, where the difference between source-and targettask coefficients is s0-sparse, CM-TDP attains regret eO (dK 1 + s0) log T .