A Survey on Machine Learning Techniques for Source Code Analysis
Sharma, Tushar, Kechagia, Maria, Georgiou, Stefanos, Tiwari, Rohit, Vats, Indira, Moazen, Hadi, Sarro, Federica
–arXiv.org Artificial Intelligence
The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learning for source code analysis. We review studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we conducted an extensive literature search and identified 479 primary studies published between 2011 and 2021. We summarize our observations and findings with the help of the identified studies. Our findings suggest that the use of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task and summarize machine learning techniques employed. We identify a comprehensive list of available datasets and tools useable in this context. Finally, the paper discusses perceived challenges in this area, including the availability of standard datasets, reproducibility and replicability, and hardware resources.
arXiv.org Artificial Intelligence
Sep-13-2022
- Country:
- South America
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Brazil > Sergipe
- Aracaju (0.04)
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Chile > Santiago Metropolitan Region
- North America
- United States
- Texas > Travis County
- Austin (0.04)
- New Jersey > Middlesex County
- Piscataway (0.04)
- Colorado > Denver County
- Denver (0.04)
- Arizona > Maricopa County
- Massachusetts > Suffolk County
- Boston (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Florida > Pinellas County
- St. Petersburg (0.04)
- Alaska > Anchorage Municipality
- Anchorage (0.04)
- California
- San Francisco County > San Francisco (0.13)
- San Diego County > San Diego (0.04)
- Santa Clara County > San Jose (0.04)
- New York > New York County
- New York City (0.04)
- Texas > Travis County
- Canada
- Quebec > Montreal (0.04)
- Ontario
- Toronto (0.04)
- National Capital Region > Ottawa (0.04)
- Kingston (0.04)
- Nova Scotia > Halifax Regional Municipality
- Halifax (0.04)
- Alberta > Census Division No. 11
- Edmonton Metropolitan Region > Edmonton (0.04)
- United States
- Europe
- Netherlands > North Holland
- Amsterdam (0.04)
- Germany
- Saarland > Saarbrücken (0.04)
- Hamburg (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- France
- Occitanie > Hérault
- Montpellier (0.04)
- Hauts-de-France > Nord
- Lille (0.04)
- Auvergne-Rhône-Alpes > Lyon
- Lyon (0.04)
- Occitanie > Hérault
- Italy
- Tuscany > Florence (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Norway > Central Norway
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Sweden
- Vaestra Goetaland > Gothenburg (0.04)
- Västmanland County > Västerås (0.04)
- Stockholm > Stockholm (0.04)
- Blekinge County > Karlskrona (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Portugal > Braga
- Braga (0.04)
- Estonia > Harju County
- Tallinn (0.04)
- Netherlands > North Holland
- Asia
- Russia (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Vietnam > Da Nang
- Da Nang (0.04)
- India
- Telangana > Hyderabad (0.04)
- Tamil Nadu > Chennai (0.04)
- Maharashtra > Pune (0.04)
- Karnataka > Bengaluru (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China
- Middle East
- Saudi Arabia > Northern Borders Province
- Arar (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Iran > Tehran Province
- Tehran (0.04)
- Saudi Arabia > Northern Borders Province
- Singapore > Central Region
- Singapore (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- South America
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Education > Curriculum
- Subject-Specific Education (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language > Text Processing (1.00)
- Cognitive Science > Problem Solving (0.67)
- Representation & Reasoning
- Expert Systems (0.92)
- Search (0.67)
- Uncertainty
- Fuzzy Logic (0.67)
- Bayesian Inference (0.67)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Evolutionary Systems (1.00)
- Decision Tree Learning (0.94)
- Learning Graphical Models > Directed Networks
- Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence