The Kentucky Department of Education announced Wednesday 57 percent of 2018 graduates met the benchmark in English compared with 56 percent in 2017. It said 44 percent met the benchmark in math compared with 42 percent the year before. For reading, 53 percent met the benchmark in reading compared with 51 percent the previous year.
Google and Baidu collaborated with researchers at Harvard and Stanford to define a suite of benchmarks for machine learning. So far, AMD, Intel, two AI startups, and two other universities have expressed support for MLPerf, an initial version of which will be ready for use in August. Today's hardware falls far short of running neural-networking jobs at the performance levels desired. A flood of new accelerators are coming to market, but the industry lacks ways to measure them. To fill the gap, the first release of MLPerf will focus on training jobs on a range of systems from workstations to large data centers, a big pain point for web giants such as Baidu and Google.
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at https://super.gluebenchmark.com. Papers published at the Neural Information Processing Systems Conference.
You can add another big name to the list of phone makers found cheating on benchmarks. UL Benchmarks has delisted Oppo's Find X and F7 phones from its 3DMark charts after testing from itself and Tech2 revealed that both devices were artificially ramping up processor performance when they detected the test by name. Oppo acknowledged that it always stepped things up when it detected "games or 3D Benchmarks that required high performance," but claimed that any app would run full bore if you tapped on the screen every few seconds to signal your actions. UL, however, rejected the justifications. It was clear that Oppo was looking for the benchmark by name and not the extra processing load involved, according to the outfit.
Stanford University recently released the 2021 AI Index, highlighting major trends and advancements in artificial intelligence. The fourth edition of the report talked about technology's impact on society, education, and policy and outlined the progress made in other AI subdomains such as deep learning, object detection, NLP, etc. The highlights from the 2021 report included AI research citations, AI startup fundings, and growing conversation around AI ethics. One of the more significant observations made in the report was about the need for more and better benchmarks in AI and other related fields such as ethics, NLP, and computer vision. "We're running out of tests as fast as we can build them," said Jack Clark, head of an OECD group working on algorithm impact assessment and former policy director for OpenAI.