Researchers measure reliability, confidence for next-gen AI