Targeted Test Selection Approach in Continuous Integration

Plyusnin, Pavel, Antonov, Aleksey, Ermakov, Vasilii, Khaybriev, Aleksandr, Kikot, Margarita, Alimova, Ilseyar, Moiseev, Stanislav

arXiv.org Artificial Intelligence 

Abstract--In modern software development change-based testing plays a crucial role. However, as codebases expand and test suites grow, efficiently managing the testing process becomes increasingly challenging, especially given the high frequency of daily code commits. We propose T argeted T est Selection (T -TS), a machine learning approach for industrial test selection. Our key innovation is a data representation that represent commits as Bags-of-Words of changed files, incorporates cross-file and additional predictive features, and notably avoids the use of coverage maps. Deployed in production, T -TS was comprehensively evaluated against industry standards and recent methods using both internal and public datasets, measuring time efficiency and fault detection. On live industrial data, T -TS selects only 15% of tests, reduces execution time by 5.9, accelerates the pipeline by 5.6, and detects over 95% of test failures. The implementation is publicly available to support further research and practical adoption. Continuous integration (CI) is a common and widely used software engineering development practice. Each CI cycle involves software testing that aims to detect potential bugs in the changed code before deploying it to production. One of the key tasks performed in CI testing is regression testing, where new code changes are tested within each CI cycle. In order to prevent bugs occurring after critical stage in CI cycle (e.g. As long as a project is being developed and handled, the amount of test cases increases proportionally. This necessity leads to huge computational resources consumption and regression testing becomes time-consuming. Hence, feedback for authors of changes is available long after tests execution had been run and further development is thus collapsed until all executed tests outcomes are known.