Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

Open in new window