Goto

Collaborating Authors

 agarwal







EfficientFirst-OrderContextualBandits: Prediction,Allocation,andTriangularDiscrimination

Neural Information Processing Systems

On the technical side, we show that the logarithmic loss and an informationtheoretic quantity called thetriangular discriminationplay a fundamental role in obtaining first-order guarantees, and we combine this observation with new refinements tothe regression oracle reduction framework ofFoster and Rakhlin [29].



11f9e78e4899a78dedd439fc583b6693-Paper.pdf

Neural Information Processing Systems

There, areward function isdrawn from one of multiple possible reward models atthebeginning ofeveryepisode, buttheidentity ofthechosen rewardmodel is not revealed to the agent. Hence, the latent state space, for which the dynamics are Markovian, is not given to the agent. We study the problem of learning a near optimal policy for two reward-mixing MDPs. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality.


Improved RegretAnalysisforVariance-Adaptive LinearBanditsandHorizon-FreeLinearMixture MDPs

Neural Information Processing Systems

In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet ischallenging because variances are often not known a priori. Recently, considerable progress has been made by Zhangetal.


Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions

Neural Information Processing Systems

We consider a setting where there are $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing a combination of interventions is a problem that naturally arises in a variety of applications such as factorial design experiments and recommendation engines (e.g., showing a set of movies that maximizes engagement for a given user). Running $N \times 2^p$ experiments to estimate the various parameters is likely expensive and/or infeasible as $N$ and $p$ grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. We study this problem under a novel model that imposes latent structure across both units and combinations of interventions.