Reinforcement Learning for Heterogeneous Teams with PALO Bounds