AI Models Are Getting Smarter. New Tests Are Racing to Catch Up
Despite their expertise, AI developers don't always know what their most advanced systems are capable of--at least, not at first. To find out, systems are subjected to a range of tests--often called evaluations, or'evals'--designed to tease out their limits. But due to rapid progress in the field, today's systems regularly achieve top scores on many popular tests, including SATs and the U.S. bar exam, making it harder to judge just how quickly they are improving. A new set of much more challenging evals has emerged in response, created by companies, nonprofits, and governments. Yet even on the most advanced evals, AI systems are making astonishing progress.
Dec-24-2024, 15:05:49 GMT