Goto

Collaborating Authors

 Optimization






Predict-then-Calibrate: A New Perspective of Robust Contextual LP

Neural Information Processing Systems

The idea is to first develop a prediction model without concern for the downstream risk profile or robustness guarantee, and then utilize calibration (or recalibration) methods to quantify the uncertainty of the prediction.


Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems

Neural Information Processing Systems

Conversational Recommender Systems (CRS) actively elicit user preferences to generate adaptive recommendations. Mainstream reinforcement learning-based CRS solutions heavily rely on handcrafted reward functions, which may not be aligned with user intent in CRS tasks.



UnderstandingtheEffectofStochasticity inPolicyOptimization

Neural Information Processing Systems

Until recently it had generally been assumed thatmethods based onfollowingthepolicygradient (PG)[1]could notbeguaranteed toconverge to globally optimal solutions, given that the policy value function is not concave.