AITopics | activeclean

Collaborating Authors

activeclean

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ActiveClean: Generating Line-Level Vulnerability Data via Active Learning

Joshy, Ashwin Kallingal, Alam, Mirza Sanjida, Sharmin, Shaila, Li, Qi, Le, Wei

arXiv.org Artificial IntelligenceDec-3-2023

Deep learning vulnerability detection tools are increasing in popularity and have been shown to be effective. These tools rely on large volume of high quality training data, which are very hard to get. Most of the currently available datasets provide function-level labels, reporting whether a function is vulnerable or not vulnerable. However, for a vulnerability detection to be useful, we need to also know the lines that are relevant to the vulnerability. This paper makes efforts towards developing systematic tools and proposes. ActiveClean to generate the large volume of line-level vulnerability data from commits. That is, in addition to function-level labels, it also reports which lines in the function are likely responsible for vulnerability detection. In the past, static analysis has been applied to clean commits to generate line-level data. Our approach based on active learning, which is easy to use and scalable, provide a complementary approach to static analysis. We designed semantic and syntactic properties from commit lines and use them to train the model. We evaluated our approach on both Java and C datasets processing more than 4.3K commits and 119K commit lines. AcitveClean achieved an F1 score between 70-74. Further, we also show that active learning is effective by using just 400 training data to reach F1 score of 70.23. Using ActiveClean, we generate the line-level labels for the entire FFMpeg project in the Devign dataset, including 5K functions, and also detected incorrect function-level labels. We demonstrated that using our cleaned data, LineVul, a SOTA line-level vulnerability detection tool, detected 70 more vulnerable lines and 18 more vulnerable functions, and improved Top 10 accuracy from 66% to 73%.

activeclean, commit line, dataset, (11 more...)

arXiv.org Artificial Intelligence

2312.01588

Country:

North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(14 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Using AI to Clean Up Big Data - DZone Big Data

#artificialintelligenceDec-4-2016, 16:20:10 GMT

Big data is a hot topic right now, but the successful utilization of that data largely rests on the ability of organizations to provide clean, accurate and usable data to employees to make real-time insights. Suffice to say, much of the data held in organizational databases is anything but clean, and few organizations seem willing to undertake the laborious job of cleaning it up. AI may be about to come to the rescue, as a team of researchers from Columbia University and the University of California at Berkeley has developed some automated software to do the job for you. The software, called ActiveClean, uses prediction models to test out datasets, and uses the results to understand the fields that require cleaning whilst simultaneously updating the models at the same time. As with so many laborious processes, human error can be a significant factor, so ActiveClean takes them out of the equation in two of the most error prone areas: finding the dirty data to begin with, and then updating models accordingly.

artificial intelligence, big data, data mining, (6 more...)

#artificialintelligence

Country: North America > United States > California (0.27)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.99)

Add feedback