Multi-head Reward Aggregation Guided by Entropy