grackle
Automated Strategy Invention for Confluence of Term Rewrite Systems
Zhang, Liao, Mitterwallner, Fabian, Jakubuv, Jan, Kaliszyk, Cezary
Term rewriting plays a crucial role in software verification and compiler optimization. With dozens of highly parameterizable techniques developed to prove various system properties, automatic term rewriting tools work in an extensive parameter space. This complexity exceeds human capacity for parameter selection, motivating an investigation into automated strategy invention. In this paper, we focus on confluence, an important property of term rewrite systems, and apply machine learning to develop the first learning-guided automatic confluence prover. Moreover, we randomly generate a large dataset to analyze confluence for term rewrite systems. Our results focus on improving the state-of-the-art automatic confluence prover CSI: When equipped with our invented strategies, it surpasses its human-designed strategies both on the augmented dataset and on the original human-created benchmark dataset Cops, proving/disproving the confluence of several term rewrite systems for which no automated proofs were known before.
Solving Hard Mizar Problems with Instantiation and Strategy Invention
Jakubův, Jan, Janota, Mikoláš, Urban, Josef
In this work, we prove over 3000 previously ATP-unproved Mizar/MPTP problems by using several ATP and AI methods, raising the number of ATP-solved Mizar problems from 75\% to above 80\%. First, we start to experiment with the cvc5 SMT solver which uses several instantiation-based heuristics that differ from the superposition-based systems, that were previously applied to Mizar,and add many new solutions. Then we use automated strategy invention to develop cvc5 strategies that largely improve cvc5's performance on the hard problems. In particular, the best invented strategy solves over 14\% more problems than the best previously available cvc5 strategy. We also show that different clausification methods have a high impact on such instantiation-based methods, again producing many new solutions. In total, the methods solve 3021 (21.3\%) of the 14163 previously unsolved hard Mizar problems. This is a new milestone over the Mizar large-theory benchmark and a large strengthening of the hammer methods for Mizar.
Scale AI launches rapid data-labeling service
Amid the boom of AI in application building, companies face a significant data-labeling problem, especially when it comes to labeling images or other media content they want to train deep learning algorithms on. Today data-labeling and infrastructure provider Scale AI launched a service called Scale Rapid that aims to solve this problem by labeling a data sample within one to three hours. Users can review the work to make sure the labeling is being done correctly, iterate upon their labeling instructions if necessary, and then ramp up to have Scale AI label the rest of their dataset. This is the latest in a series of products Scale AI has launched in the last year as it seeks to maintain its leadership in the labeling sphere. In April, the company raised $325 million, bringing its total raised to over $602 million.
Two Barbados bird species enter the short, special list of string-pullers
Two bird species, the Barbados bullfinch and the Carib grackle, have passed the string-pulling test, joining an elite group of animals capable of completing one of the most challenging animal cognition tests. The goal of the research, which was led by Jean-Nicolas Audet, a Ph.D. candidate in the Department of Biology at McGill University in Canada, is to determine the origin of innovative behavior in birds. Throughout the study, which was published Wednesday in the journal PLOS One, two of the 31 grackles and 18 of the 42 bullfinches were able to complete the string-pulling task, but what was surprising about their finding was lack of correlation between this and other cognitive tasks. "Most people previously assumed that string-pulling was just a problem-solving task and that the performance on both should be correlated. Some people even think that performance on all behavioral tasks should be correlated to some extent," Mr. Audet tells The Christian Science Monitor in an email.