Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning