Contextual bandits with entropy-based human feedback