Reviews: Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

Neural Information Processing Systems 

The paper studies the following bandit model: actions are nodes in a known directed graph and at each time step t the following happens: (1) hidden from the player, each edge (i,j) is independently declared "open" or "closed" with fixed but unknown probability w(i,j); (2) the player chooses a subset S_t of nodes of cardinality at most K; (3) all directed paths originated from nodes in S_t that go through open edges are revealed to the player; (4) the player's reward is the number of nodes in these paths. The regret of the player is measured with respect to the expected reward obtained by consistently playing the K-sized subset S of nodes that maximizes the expected reward. Since this set is NP-hard to compute, but easy to approximate, it is assumed that the player has access to an oracle that, given G and estimates of w(i,j) for each edge (i,j), returns the K-sized subset S_t that maximizes the expected reward according to the estimates. The probabilities w(i,j) are assumed to follow a linear model, w(i,j) x(i,j)*theta, where x(i,j) is the known d-dimensional feature vector associated with edge (i,j) and theta is an unknown parameter vector. A special case is the "tabular model" where all the feature vectors are orthogonal.