Online Policy Learning and Inference by Matrix Completion