Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations

Open in new window