An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

Open in new window