Machine Learning in Static Code Analysis
Machine learning has firmly entrenched in a variety of human fields, from speech recognition to medical diagnosing. The popularity of this approach is so great that people try to use it wherever they can. Some attempts to replace classical approaches with neural networks turn up unsuccessful. This time we'll consider machine learning in terms of creating effective static code analyzers for finding bugs and potential vulnerabilities. The PVS-Studio team is often asked if we want to start using machine learning to find bugs in the software source code. The short answer is yes, but to a limited extent. We believe that with machine learning, there are many pitfalls lurking in code analysis tasks. In the second part of the article, we will tell about them. Let's start with a review of new solutions and ideas. Nowadays there are many static analyzers based on or using machine learning, including deep learning and NLP for error detection. Not only did enthusiasts double down on machine learning potential, but also large companies, for example, Facebook, Amazon, or Mozilla. Some projects aren't full-fledged static analyzers, as they only find some certain errors in commits. Interestingly, almost all of them are positioned as game changer products that will make a breakthrough in the development process due to artificial intelligence. Let's look at some of the well-known examples: Deep Code is a vulnerability-searching tool for Java, JavaScript, TypeScript, and Python software code that features machine learning as a component. According to Boris Paskalev, more than 250,000 rules are already in place. This tool learns from changes, made by developers in the source code of open source projects (a million of repositories). The company itself says that their project is some kind of Grammarly for developers. In fact, this analyzer compares your solution with its project base and offers you the intended best solution from the experience of other developers. In May 2018, developers said that the support of C is on its way, but so far, this language is not supported. Although, as stated on the site, the new language support can be added in a matter of weeks due to the fact that the language depends only on one stage, which is parsing. A series of posts about basic methods of the analyzer is also available on the site. Facebook is quite zealous in its attempts to introduce new comprehensive approaches in its products.
Oct-19-2020, 18:20:45 GMT
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Information Technology > Services (0.48)
- Technology: