Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Jan-15-2025, 11:42:12 GMT–Neural Information Processing Systems

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown d_1 by d_2 matrix \Theta * with rank r \ll \{d_1, d_2\}, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that both of our methods are the first algorithm to achieve the optimal \tilde{O}((d_1 d_2)r\sqrt{T}) bound of regret presented in \cite{lu2021low} up to logarithm terms under some mild conditions, which improves upon the current regret of \tilde{O}((d_1 d_2) {3/2} \sqrt{rT}) \citep{lu2021low}.

algorithm, efficient framework, generalized low-rank matrix bandit problem, (2 more...)

Neural Information Processing Systems

Jan-15-2025, 11:42:12 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (0.78)
  - Data Science > Data Mining
    - Big Data (0.87)