AITopics | identifier name

Collaborating Authors

identifier name

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context

Neural Information Processing SystemsFeb-12-2026, 22:56:14 GMT

Integrated development environments (IDEs) assist developers in understanding repository context using static analysis.

large language model, machine learning, programming language, (24 more...)

Neural Information Processing Systems

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(3 more...)

Add feedback

662b1774ba8845fc1fa3d1fc0177ceeb-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 19:58:42 GMT

large language model, machine learning, programming language, (24 more...)

Neural Information Processing Systems

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(3 more...)

Add feedback

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages Marie-Anne Lachaux

Neural Information Processing SystemsAug-15-2025, 10:32:50 GMT

In the original approach proposed by Devlin et al. [2018], a fraction of selected masked words is

arxiv preprint arxiv, identifier name, objective, (13 more...)

Neural Information Processing Systems

Country: Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AgentDNS: A Root Domain Naming System for LLM Agents

Cui, Enfang, Cheng, Yujun, She, Rui, Liu, Dan, Liang, Zhiyuan, Guo, Minxin, Li, Tianzheng, Wei, Qian, Xing, Wenjuan, Zhong, Zhijie

arXiv.org Artificial IntelligenceMay-29-2025

The rapid evolution of Large Language Model (LLM) agents has highlighted critical challenges in cross-vendor service discovery, interoperability, and communication. Existing protocols like model context protocol and agent-to-agent protocol have made significant strides in standardizing interoperability between agents and tools, as well as communication among multi-agents. However, there remains a lack of standardized protocols and solutions for service discovery across different agent and tool vendors. In this paper, we propose AgentDNS, a root domain naming and service discovery system designed to enable LLM agents to autonomously discover, resolve, and securely invoke third-party agent and tool services across organizational and technological boundaries. Inspired by the principles of the traditional DNS, AgentDNS introduces a structured mechanism for service registration, semantic service discovery, secure invocation, and unified billing. We detail the architecture, core functionalities, and use cases of AgentDNS, demonstrating its potential to streamline multi-agent collaboration in real-world scenarios. The source code will be published on https://github.com/agentdns.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2505.22368

Country: Asia > China > Guangdong Province (0.28)

Genre:

Research Report (0.64)
Workflow (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SCALAR: A Part-of-speech Tagger for Identifiers

Newman, Christian D., Scholten, Brandon, Testa, Sophia, Behler, Joshua A. C., Banabilah, Syreen, Collard, Michael L., Decker, Michael J., Mkaouer, Mohamed Wiem, Zampieri, Marcos, AlOmar, Eman Abdullah, Alsuhaibani, Reem, Peruma, Anthony, Maletic, Jonathan I.

arXiv.org Artificial IntelligenceApr-25-2025

--The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github 1 Index T erms --Program comprehension, identifier naming, part-of-speech tagging, natural language processing, software maintenance, software evolution I. I NTRODUCTION The identifiers developers create represent a significant amount of the information other developers must use to understand related code. Given that identifiers represent, on average, 70% of the characters in a code base [1], and developers spend more time reading code than writing [2], [3], it is important for researchers to better understand of how identifiers convey information, and how they can be improved to increase developer reading efficiency.

artificial intelligence, identifier, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.17038

Country:

North America > United States > Michigan > Genesee County > Flint (0.14)
North America > United States > Ohio > Wood County > Bowling Green (0.04)
North America > United States > Ohio > Summit County > Green (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)

Add feedback

Guiding Language Models of Code with Global Context using Monitors

Agrawal, Lakshya A, Kanade, Aditya, Goyal, Navin, Lahiri, Shuvendu K., Rajamani, Sriram K.

arXiv.org Artificial IntelligenceNov-3-2023

Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding. We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On models of varying parameter scale, by monitoring for type-consistent object dereferences, MGD consistently improves compilation rates and agreement with ground truth. Further, LMs with fewer parameters, when augmented with MGD, can outperform larger LMs. With MGD, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model. We also conduct a generalizability study to evaluate the ability of MGD to generalize to multiple programming languages (Java, C# and Rust), coding scenarios (e.g., correct number of arguments to method calls), and to enforce richer semantic constraints (e.g., stateful API protocols). Our data and implementation are available at https://github.com/microsoft/monitors4codegen .

mgd, repository, static analysis, (16 more...)

arXiv.org Artificial Intelligence

2306.10763

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

Add feedback

AdaptivePaste: Code Adaptation through Learning Semantics-aware Variable Usage Representations

Liu, Xiaoyu, Jang, Jinu, Sundaresan, Neel, Allamanis, Miltiadis, Svyatkovskiy, Alexey

arXiv.org Artificial IntelligenceOct-6-2023

In software development, it is common for programmers to copy-paste or port code snippets and then adapt them to their use case. This scenario motivates the code adaptation task -- a variant of program repair which aims to adapt variable identifiers in a pasted snippet of code to the surrounding, preexisting source code. However, no existing approach has been shown to effectively address this task. In this paper, we introduce AdaptivePaste, a learning-based approach to source code adaptation, based on transformers and a dedicated dataflow-aware deobfuscation pre-training task to learn meaningful representations of variable usage patterns. We evaluate AdaptivePaste on a dataset of code snippets in Python. Results suggest that our model can learn to adapt source code with 79.8% accuracy. To evaluate how valuable is AdaptivePaste in practice, we perform a user study with 10 Python developers on a hundred real-world copy-paste instances. The results show that AdaptivePaste reduces the dwell time to nearly half the time it takes for manual code adaptation, and helps to avoid bugs. In addition, we utilize the participant feedback to identify potential avenues for improvement of AdaptivePaste.

adaptivepaste, code snippet, snippet, (15 more...)

arXiv.org Artificial Intelligence

2205.11023

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Predicting Program Properties from 'Big Code'

Communications of the ACMFeb-21-2019, 22:58:31 GMT

We present a new approach for predicting program properties from large codebases (aka "Big Code"). Our approach learns a probabilistic model from "Big Code" and uses this model to predict properties of new, unseen programs. The key idea of our work is to transform the program into a representation that allows us to formulate the problem of inferring program properties as structured prediction in machine learning. This enables us to leverage powerful probabilistic models such as Conditional Random Fields (CRFs) and perform joint prediction of program properties. As an example of our approach, we built a scalable prediction engine called JSNICE for solving two kinds of tasks in the context of JavaScript: predicting (syntactic) names of identifiers and predicting (semantic) type annotations of variables. Experimentally, JSNICE predicts correct names for 63% of name identifiers and its type annotation predictions are correct in 81% of cases. Since its public release at http://jsnice.org, JSNice has become a popular system with hundreds of thousands of uses. By formulating the problem of inferring program properties as structured prediction, our work opens up the possibility for a range of new "Big Code" applications such as de-obfuscators, decompilers, invariant generators, and others. Recent years have seen significant progress in the area of programming languages driven by advances in type systems, constraint solving, program analysis, and synthesis techniques. Fundamentally, these methods reason about each program in isolation and while powerful, the effectiveness of programming tools based on these techniques is approaching its inherent limits. Thus, a more disruptive change is needed if a significant improvement is to take place. At the same time, creating probabilistic models from large datasets (also called "Big Data") has transformed a number of areas such as natural language processing, computer vision, recommendation systems, and many others. However, despite the overwhelming success of "Big Data" in a variety of application domains, learning from large datasets of programs has previously not had tangible impact on programming tools. Yet, with the tremendous growth of publicly available source code in repositories such as GitHub4 and BitBucket2 (referred to as "Big Code" by a recent DARPA initiative11) comes the opportunity to create new kinds of programming tools based on probabilistic models of such data.

annotation, prediction, program element, (17 more...)

Communications of the ACM

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback