Online Inverse Linear Optimization: Efficient Logarithmic-Regret Algorithm, Robustness to Suboptimality, and Lower Bound

Shinsaku Sakaue[1], CyberAgent, Tokyo, Japan, shinsaku.sakaue@gmail.com, "3026 Taira Tsuchiya, The University of Tokyo and RIKEN AIP, Tokyo, Japan, tsuchiya@mist.i.u-tokyo.ac.jp, "3026 Han Bao[1], The Institute of Statistical Mathematics, Tokyo, Japan, bao.han@ism.ac.jp, "3026 Taihei Oki, Hokkaido University, Hokkaido, Japan, oki@icredd.hokudai.ac.jp

Neural Information Processing Systems 

In online inverse linear optimization, a learner observes time-varying sets of feasible actions and an agent's optimal actions, selected by solving linear optimization over the feasible actions. The learner sequentially makes predictions of the agent's true linear objective function, and their quality is measured by the regret, the cumulative gap between optimal objective values and those achieved by following the learner's predictions. A seminal work by Bärmann et al. (2017) obtained a regret bound of O( T), where T is the time horizon. Subsequently, the regret bound has been improved to O(n4 lnT) by Besbes et al. (2021, 2025) and to O(nlnT) by Gollapudi et al. (2021), where nis the dimension of the ambient space of objective vectors. However, these logarithmic-regret methods are highly inefficient when T is large, as they need to maintain regions specified by O(T) constraints, which represent possible locations of the true objective vector.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found