AITopics | package name

Collaborating Authors

package name

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Ye, Kai, Su, Liangcai, Qian, Chenxiong

arXiv.org Artificial IntelligenceSep-10-2025

Code generation has emerged as a pivotal capability of Large Language Models(LLMs), revolutionizing development efficiency for programmers of all skill levels. However, the complexity of data structures and algorithmic logic often results in functional deficiencies and security vulnerabilities in generated code, reducing it to a prototype requiring extensive manual debugging. While Retrieval-Augmented Generation (RAG) can enhance correctness and security by leveraging external code manuals, it simultaneously introduces new attack surfaces. In this paper, we pioneer the exploration of attack surfaces in Retrieval-Augmented Code Generation (RACG), focusing on malicious dependency hijacking. We demonstrate how poisoned documentation containing hidden malicious dependencies (e.g., matplotlib_safe) can subvert RACG, exploiting dual trust chains: LLM reliance on RAG and developers' blind trust in LLM suggestions. To construct poisoned documents, we propose ImportSnare, a novel attack framework employing two synergistic strategies: 1)Position-aware beam search optimizes hidden ranking sequences to elevate poisoned documents in retrieval results, and 2)Multilingual inductive suggestions generate jailbreaking sequences to manipulate LLMs into recommending malicious dependencies. Through extensive experiments across Python, Rust, and JavaScript, ImportSnare achieves significant attack success rates (over 50% for popular libraries such as matplotlib and seaborn) in general, and is also able to succeed even when the poisoning ratio is as low as 0.01%, targeting both custom and real-world malicious packages. Our findings reveal critical supply chain risks in LLM-powered development, highlighting inadequate security alignment for code generation tasks. To support future research, we will release the multilingual benchmark suite and datasets. The project homepage is https://importsnare.github.io.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.07941

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities

Krishna, Arjun, Galinkin, Erick, Derczynski, Leon, Martin, Jeffrey

arXiv.org Artificial IntelligenceJan-31-2025

Large Language Models (LLMs) have become an essential tool in the programmer's toolkit, but their tendency to hallucinate code can be used by malicious actors to introduce vulnerabilities to broad swathes of the software supply chain. In this work, we analyze package hallucination behaviour in LLMs across popular programming languages examining both existing package references and fictional dependencies. By analyzing this package hallucination behaviour we find potential attacks and suggest defensive strategies to defend against these attacks. We discover that package hallucination rate is predicated not only on model choice, but also programming language, model size, and specificity of the coding task request. The Pareto optimality boundary between code generation performance and package hallucination is sparsely populated, suggesting that coding models are not being optimized for secure code. Additionally, we find an inverse correlation between package hallucination rate and the HumanEval coding benchmark, offering a heuristic for evaluating the propensity of a model to hallucinate packages. Our metrics, findings and analyses provide a base for future models, securing AI-assisted software development workflows against package supply chain attacks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.19012

Country:

Europe > Austria > Vienna (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

Spracklen, Joseph, Wijewickrama, Raveen, Sakib, A H M Nazmus, Maiti, Anindya, Jadliwala, Murtuza

arXiv.org Artificial IntelligenceJun-11-2024

The reliance of popular programming languages such as Python and JavaScript on centralized package repositories and open-source software, combined with the emergence of code-generating Large Language Models (LLMs), has created a new type of threat to the software supply chain: package hallucinations. These hallucinations, which arise from fact-conflicting errors when generating code using LLMs, represent a novel form of package confusion attack that poses a critical threat to the integrity of the software supply chain. This paper conducts a rigorous and comprehensive evaluation of package hallucinations across different programming languages, settings, and parameters, exploring how different configurations of LLMs affect the likelihood of generating erroneous package recommendations and identifying the root causes of this phenomena. Using 16 different popular code generation models, across two programming languages and two unique prompt datasets, we collect 576,000 code samples which we analyze for package hallucinations. Our findings reveal that 19.7% of generated packages across all the tested LLMs are hallucinated, including a staggering 205,474 unique examples of hallucinated package names, further underscoring the severity and pervasiveness of this threat. We also implemented and evaluated mitigation strategies based on Retrieval Augmented Generation (RAG), self-detected feedback, and supervised fine-tuning. These techniques demonstrably reduced package hallucinations, with hallucination rates for one model dropping below 3%. While the mitigation efforts were effective in reducing hallucination rates, our study reveals that package hallucinations are a systemic and persistent phenomenon that pose a significant challenge for code generating LLMs.

hallucination, llm, package hallucination, (15 more...)

arXiv.org Artificial Intelligence

2406.10279

Country:

Asia > North Korea (0.14)
North America > United States > Texas (0.04)
North America > United States > Oklahoma (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Naming Conventions (and Defects) of Pre-trained Deep Learning Models in Hugging Face and Other Model Hubs

Jiang, Wenxin, Cheung, Chingwo, Thiruvathukal, George K., Davis, James C.

arXiv.org Artificial IntelligenceOct-2-2023

As innovation in deep learning continues, many engineers want to adopt Pre-Trained deep learning Models (PTMs) as components in computer systems. PTMs are part of a research-to-practice pipeline: researchers publish PTMs, which engineers adapt for quality or performance and then deploy. If PTM authors choose appropriate names for their PTMs, it could facilitate model discovery and reuse. However, prior research has reported that model names are not always well chosen, and are sometimes erroneous. The naming conventions and naming defects for PTM packages have not been systematically studied - understanding them will add to our knowledge of how the research-to-practice process works for PTM packages In this paper, we report the first study of PTM naming conventions and the associated PTM naming defects. We define the components of a PTM package name, comprising the package name and claimed architecture from the metadata. We present the first study focused on characterizing the nature of naming in PTM ecosystem. To this end, we developed a novel automated naming assessment technique that can automatically extract the semantic and syntactic patterns. To identify potential naming defects, we developed a novel algorithm, automated DNN ARchitecture Assessment pipeline (DARA), to cluster PTMs based on architectural differences. Our study suggests the naming conventions for PTMs, and frames the naming conventions as signal of the research-to-practice relationships in the PTM ecosystem. We envision future works on further empirical study on leveraging meta-features of PTMs to support model search and reuse.

architecture, convention, ptm package, (14 more...)

arXiv.org Artificial Intelligence

2310.01642

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training GANs in Julia's Flux

#artificialintelligenceNov-12-2021, 17:49:04 GMT

In order to effectively run machine learning experiments we need a fast turn-around time for model training. So simply implementing the model is not the only thing we need to worry about. We also want to be able to change the hyperparameters in a convenient way. This could either be through a configuration file or through command line arguments. This post demonstrates how I train a vanilla GAN on the MNIST dataset. It is not about GAN theory, for this the original paper by Goodfellow et al. [[1]] is a good starting point. Instead I focus on how to structure the code and subtle implementation issues I came across when writing the code. You can find the current version of the code on github.

argument, command line argument, hyperparameter, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Classification of descriptions and summary using multiple passes of statistical and natural language toolkits

Banthia, Saumya, Sharma, Anantha

arXiv.org Artificial IntelligenceSep-10-2020

This document describes a possible approach that can be used to check the relevance of a summary / definition of an entity with respect to its name. This classifier focuses on the relevancy of an entity's name to its summary / definition, in other words, it is a name relevance check. The percentage score obtained from this approach can be used either on its own or used to supplement scores obtained from other metrics to arrive upon a final classification; at the end of the document, potential improvements have also been outlined. The dataset that this document focuses on achieving an objective score is a list of package names and their respective summaries (sourced from pypi.org [1]).

artificial intelligence, natural language, package name, (15 more...)

arXiv.org Artificial Intelligence

2009.04953

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Top R Packages for Machine Learning

@machinelearnbotJun-16-2017, 15:05:23 GMT

Much of our curriculum is based on feedback from corporate and government partners about the technologies they are looking to learn. But we wanted to develop a more data-driven approach to what we should be teaching in our data science corporate training and our free fellowship for masters and PhDs looking to enter data science careers in industry. What are the most popular ML packages? Let's look at a ranking based on package downloads and social website activity. The ranking is based on average rank of CRAN (The Comprehensive R Archive Network) downloads and Stack Overflow activity (full ranking here [CSV]).

artificial intelligence, decision tree learning, machine learning, (16 more...)

@machinelearnbot

Industry: Education > Educational Setting > Corporate Training (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.31)

Add feedback

Top R Packages for Machine Learning

#artificialintelligenceFeb-7-2017, 19:55:16 GMT

artificial intelligence, decision tree learning, machine learning, (16 more...)

#artificialintelligence

Industry: Education > Educational Setting > Corporate Training (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.31)

Add feedback