This benchmark used Reddit's AITA to test how much AI models suck up to us