Goto

Collaborating Authors

 langford






EfficientFirst-OrderContextualBandits: Prediction,Allocation,andTriangularDiscrimination

Neural Information Processing Systems

On the technical side, we show that the logarithmic loss and an informationtheoretic quantity called thetriangular discriminationplay a fundamental role in obtaining first-order guarantees, and we combine this observation with new refinements tothe regression oracle reduction framework ofFoster and Rakhlin [29].





11f9e78e4899a78dedd439fc583b6693-Paper.pdf

Neural Information Processing Systems

There, areward function isdrawn from one of multiple possible reward models atthebeginning ofeveryepisode, buttheidentity ofthechosen rewardmodel is not revealed to the agent. Hence, the latent state space, for which the dynamics are Markovian, is not given to the agent. We study the problem of learning a near optimal policy for two reward-mixing MDPs. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality.


10eaa0aae94b34308e9b3fa7b677cbe1-Paper-Conference.pdf

Neural Information Processing Systems

Nevertheless, despite theproliferation ofresearch onalgorithmic fairness inrecent years, veryfew methods exist that can handle multiclass classification tasks with non-binary sensitive attributes.