AITopics | analysis tool

Collaborating Authors

analysis tool

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ensuring Functional Correctness of Large Code Models with Selective Generation

Jeong, Jaewoo, Kim, Taesoo, Park, Sangdon

arXiv.org Artificial IntelligenceOct-27-2025

The hallucination of code generation models hinders their applicability to systems requiring higher safety standards. One critical bottleneck in addressing code hallucination is the difficulty of identifying the functional correctness of generated code, due to its unnatural form. We address this core bottleneck by automatically generating unit tests using dynamic code analysis tools, leveraging the \emph{executable nature} of code. Accordingly, we propose \emph{selective code generator} that abstains from uncertain generations -- based on the functional correctness evaluated by generated unit tests -- to theoretically control the correctness among non-abstained answers, \ie the false discovery rate. Finally, we propose to use generated unit tests in evaluation as well as in learning for precise code evaluation, calling this paradigm \emph{FuzzEval}. We demonstrate the efficacy of our method along with the controllability of code hallucination and reasonable selection efficiency.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.13553

Country:

Asia (0.46)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)

Add feedback

Learning to Triage Taint Flows Reported by Dynamic Program Analysis in Node.js Packages

Ni, Ronghao, Yang, Aidan Z. H., Hsu, Min-Chien, Sabino, Nuno, Jia, Limin, Martins, Ruben, Cassel, Darion, Cheang, Kevin

arXiv.org Artificial IntelligenceOct-24-2025

Program analysis tools often produce large volumes of candidate vulnerability reports that require costly manual review, creating a practical challenge: how can security analysts prioritize the reports most likely to be true vulnerabilities? This paper investigates whether machine learning can be applied to prioritizing vulnerabilities reported by program analysis tools. We focus on Node.js packages and collect a benchmark of 1,883 Node.js packages, each containing one reported ACE or ACI vulnerability. We evaluate a variety of machine learning approaches, including classical models, graph neural networks (GNNs), large language models (LLMs), and hybrid models that combine GNN and LLMs, trained on data based on a dynamic program analysis tool's output. The top LLM achieves $F_{1} {=} 0.915$, while the best GNN and classical ML models reaching $F_{1} {=} 0.904$. At a less than 7% false-negative rate, the leading model eliminates 66.9% of benign packages from manual review, taking around 60 ms per package. If the best model is tuned to operate at a precision level of 0.8 (i.e., allowing 20% false positives amongst all warnings), our approach can detect 99.2% of exploitable taint flows while missing only 0.8%, demonstrating strong potential for real-world vulnerability triage.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.20739

Country: North America (0.46)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Fractional Fourier Transform Hu Y u,,1 Jie Huang,1 Lingzhi Li2 Man Zhou

Neural Information Processing SystemsOct-9-2025, 10:18:04 GMT

Our code is released publicly at https://github.com/yuhuUSTC/FRFT.

data quality, frft, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Data Science > Data Quality > Data Transformation (0.70)

Add feedback

Analysing Python Machine Learning Notebooks with Moose

Mignard, Marius, Costiou, Steven, Anquetil, Nicolas, Etien, Anne

arXiv.org Artificial IntelligenceSep-16-2025

Machine Learning (ML) code, particularly within notebooks, often exhibits lower quality compared to traditional software. Bad practices arise at three distinct levels: general Python coding conventions, the organizational structure of the notebook itself, and ML-specific aspects such as reproducibility and correct API usage. However, existing analysis tools typically focus on only one of these levels and struggle to capture ML-specific semantics, limiting their ability to detect issues. This paper introduces Vespucci Linter, a static analysis tool with multi-level capabilities, built on Moose and designed to address this challenge. Leveraging a metamodeling approach that unifies the notebook's structural elements with Python code entities, our linter enables a more contextualized analysis to identify issues across all three levels. We implemented 22 linting rules derived from the literature and applied our tool to a corpus of 5,000 notebooks from the Kaggle platform. The results reveal violations at all levels, validating the relevance of our multi-level approach and demonstrating Vespucci Linter's potential to improve the quality and reliability of ML development in notebook environments.

artificial intelligence, machine learning, notebook, (16 more...)

arXiv.org Artificial Intelligence

2509.11748

Country:

Europe (0.46)
North America > United States (0.16)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Validation of a CT-brain analysis tool for measuring global cortical atrophy in older patient cohorts

Bal, Sukhdeep, Colbourne, Emma, Gan, Jasmine, Griffanti, Ludovica, Hanayik, Taylor, Demeyere, Nele, Davies, Jim, Pendlebury, Sarah T, Jenkinson, Mark

arXiv.org Artificial IntelligenceSep-11-2025

Quantification of brain atrophy currently requires visual rating scales which are time consuming and automated brain image analysis is warranted. We validated our automated deep learning (DL) tool measuring the Global Cerebral Atrophy (GCA) score against trained human raters, and associations with age and cognitive impairment, in representative older (>65 years) patients. CT-brain scans were obtained from patients in acute medicine (ORCHARD-EPR), acute stroke (OCS studies) and a legacy sample. Scans were divided in a 60/20/20 ratio for training, optimisation and testing. CT-images were assessed by two trained raters (rater-1=864 scans, rater-2=20 scans). Agreement between DL tool-predicted GCA scores (range 0-39) and the visual ratings was evaluated using mean absolute error (MAE) and Cohen's weighted kappa. Among 864 scans (ORCHARD-EPR=578, OCS=200, legacy scans=86), MAE between the DL tool and rater-1 GCA scores was 3.2 overall, 3.1 for ORCHARD-EPR, 3.3 for OCS and 2.6 for the legacy scans and half had DL-predicted GCA error between -2 and 2. Inter-rater agreement was Kappa=0.45 between the DL-tool and rater-1, and 0.41 between the tool and rater- 2 whereas it was lower at 0.28 for rater-1 and rater-2. There was no difference in GCA scores from the DL-tool and the two raters (one-way ANOVA, p=0.35) or in mean GCA scores between the DL-tool and rater-1 (paired t-test, t=-0.43, p=0.66), the tool and rater-2 (t=1.35, p=0.18) or between rater-1 and rater-2 (t=0.99, p=0.32). DL-tool GCA scores correlated with age and cognitive scores (both p<0.001). Our DL CT-brain analysis tool measured GCA score accurately and without user input in real-world scans acquired from older patients. Our tool will enable extraction of standardised quantitative measures of atrophy at scale for use in health data research and will act as proof-of-concept towards a point-of-care clinically approved tool.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.08012

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration

Hu, Junze, Jin, Xiangyu, Zeng, Yizhe, Liu, Yuling, Li, Yunpeng, Du, Dan, Xie, Kaiyu, Zhu, Hongsong

arXiv.org Artificial IntelligenceJul-22-2025

-- Code auditing, a method where security researchers review source code to identify vulnerabilities, has become increasingly impractical for large-scale open-source projects. While Large Language Models (LLMs) demonstrate impressive code generation capabilities, they are constrained by limitations in context window size, memory capacity, and complex reasoning abilities, making direct vulnerability detection across entire projects infeasible. Static code analysis tools, though effective to a degree, are heavily reliant on their predefined scanning rules. T o address these challenges, we present QLPro, a vulnerability detection framework that systematically integrates LLMs with static code analysis tools. QLPro introduces both a triple-voting mechanism and a three-role mechanism to enable fully automated vulnerability detection across entire open-source projects without human intervention. Specifically, QLPro first utilizes static analysis tools to extract all taint specifications from a project, then employs LLMs and the triple-voting mechanism to classify and match these taint specifications, thereby enhancing both the accuracy and appropriateness of taint specification classification.

large language model, natural language, vulnerability, (17 more...)

arXiv.org Artificial Intelligence

2506.23644

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AI Software Engineer: Programming with Trust

Roychoudhury, Abhik, Pasareanu, Corina, Pradel, Michael, Ray, Baishakhi

arXiv.org Artificial IntelligenceFeb-19-2025

Columbia University, USA Large Language Models (LLMs) have shown surprising proficie ncy in generating code snippets, promising to automate large parts of software engineering via artifici al intelligence (AI). We argue that successfully deploying AI software engineers requires a level of trust eq ual to or even greater than the trust established by human-driven software engineering practices. The recen t trend toward LLM agents offers a path toward integrating the power of LLMs to create new code with the powe r of analysis tools to increase trust in the code. This opinion piece comments on whether LLM agents could dominate software engineering workflows in the future and whether the focus of programming will shift from programming at scale to programming with trust. Software engineering is undergoing a significant phase of greater au tomation owing to the emergence of Large Language Models (LLMs) for code.

agent, ai software engineer, software engineer, (10 more...)

arXiv.org Artificial Intelligence

2502.13767

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
Asia > Singapore > Central Region > Singapore (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Multi-Programming Language Sandbox for LLMs

Dou, Shihan, Zhang, Jiazheng, Zang, Jianxiang, Tao, Yunbo, Zhou, Weikang, Jia, Haoxiang, Liu, Shichun, Yang, Yuming, Xi, Zhiheng, Wu, Shenxi, Zhang, Shaoqing, Wu, Muling, Lv, Changze, Xiong, Limao, Zhan, Wenyu, Zhang, Lin, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang, Wu, Yueming, Wen, Ming, Zheng, Rui, Ji, Tao, Cao, Yixin, Gui, Tao, Qiu, Xipeng, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceNov-5-2024

We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs). It can automatically identify the programming language of the code, compiling and executing it within an isolated sub-sandbox to ensure safety and stability. In addition, MPLSandbox also integrates both traditional and LLM-based code analysis tools, providing a comprehensive analysis of generated code. MPLSandbox can be effortlessly integrated into the training and deployment of LLMs to improve the quality and correctness of their generated code. It also helps researchers streamline their workflows for various LLM-based code-related tasks, reducing the development cost. To validate the effectiveness of MPLSandbox, we integrate it into training and deployment approaches, and also employ it to optimize workflows for a wide range of real-world code-related tasks. Our goal is to enhance researcher productivity on LLM-based code-related tasks by simplifying and automating workflows through delegation to MPLSandbox.

arxiv preprint arxiv, mplsandbox, multi-programming language sandbox, (12 more...)

arXiv.org Artificial Intelligence

2410.23074

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
North America > United States > Hawaii (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre:

Research Report (0.82)
Overview (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unintentional Security Flaws in Code: Automated Defense via Root Cause Analysis

Islam, Nafis Tanveer, Bethany, Mazal, Manuel, Dylan, Jadliwala, Murtuza, Najafirad, Peyman

arXiv.org Artificial IntelligenceAug-30-2024

Software security remains a critical concern, particularly as junior developers, often lacking comprehensive knowledge of security practices, contribute to codebases. While there are tools to help developers proactively write secure code, their actual effectiveness in helping developers fix their vulnerable code remains largely unmeasured. Moreover, these approaches typically focus on classifying and localizing vulnerabilities without highlighting the specific code segments that are the root cause of the issues, a crucial aspect for developers seeking to fix their vulnerable code. To address these challenges, we conducted a comprehensive study evaluating the efficacy of existing methods in helping junior developers secure their code. Our findings across five types of security vulnerabilities revealed that current tools enabled developers to secure only 36.2\% of vulnerable code. Questionnaire results from these participants further indicated that not knowing the code that was the root cause of the vulnerability was one of their primary challenges in repairing the vulnerable code. Informed by these insights, we developed an automated vulnerability root cause (RC) toolkit called T5-RCGCN, that combines T5 language model embeddings with a graph convolutional network (GCN) for vulnerability classification and localization. Additionally, we integrated DeepLiftSHAP to identify the code segments that were the root cause of the vulnerability. We tested T5-RCGCN with 56 junior developers across three datasets, showing a 28.9\% improvement in code security compared to previous methods. Developers using the tool also gained a deeper understanding of vulnerability root causes, resulting in a 17.0\% improvement in their ability to secure code independently. These results demonstrate the tool's potential for both immediate security enhancement and long-term developer skill growth.

developer, participant, vulnerability, (15 more...)

arXiv.org Artificial Intelligence

2409.00199

Country: