AITopics | codenet

Collaborating Authors

codenet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Out of style: Misadventures with LLMs and code style transfer

Munson, Karl, Ting, Chih-Kai, Wade, Serenity, Savla, Anish, Dolby, Julian, Kate, Kiran, Srinivas, Kavitha

arXiv.org Artificial IntelligenceJun-14-2024

Like text, programs have styles, and certain programming styles are more desirable than others for program readability, maintainability, and performance. Code style transfer, however, is difficult to automate except for trivial style guidelines such as limits on line length. Inspired by the success of using language models for text style transfer, we investigate if code language models can perform code style transfer. Code style transfer, unlike text transfer, has rigorous requirements: the system needs to identify lines of code to change, change them correctly, and leave the rest of the program untouched. We designed CSB (Code Style Benchmark), a benchmark suite of code style transfer tasks across five categories including converting for-loops to list comprehensions, eliminating duplication in code, adding decorators to methods, etc. We then used these tests to see if large pre-trained code language models or fine-tuned models perform style transfer correctly, based on rigorous metrics to test that the transfer did occur, and the code still passes functional tests. Surprisingly, language models failed to perform all of the tasks, suggesting that they perform poorly on tasks that require code understanding. We will make available the large-scale corpora to help the community build better code models.

dataset, language model, style transfer, (16 more...)

arXiv.org Artificial Intelligence

2406.1032

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)

Add feedback

Investigating the Efficacy of Large Language Models for Code Clone Detection

Khajezade, Mohamad, Wu, Jie JW, Fard, Fatemeh Hendijani, Rodríguez-Pérez, Gema, Shehata, Mohamed Sami

arXiv.org Artificial IntelligenceJan-30-2024

Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These tasks are `generative' tasks. However, there is limited research on the usage of LLMs for `non-generative' tasks such as classification using the prompt-based paradigm. In this preliminary exploratory study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task. By building a mono-lingual and cross-lingual CCD dataset derived from CodeNet, we first investigated two different prompts using ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. We then conducted an analysis to understand the strengths and weaknesses of ChatGPT in CCD. ChatGPT surpasses the baselines in cross-language CCD attaining an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD, with an F1-score of 0.878. Also, the prompt and the difficulty level of the problems has an impact on the performance of ChatGPT. Finally we provide insights and future directions based on our initial analysis

chatgpt, code clone detection, dataset, (9 more...)

arXiv.org Artificial Intelligence

2401.13802

Country:

North America > Canada > British Columbia > Regional District of Central Okanagan > Kelowna (0.15)
Europe > Portugal > Lisbon > Lisbon (0.05)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Programs by Exploiting (Fuzzing) Test Cases

Zhao, Jianyu, Rong, Yuyang, Guo, Yiwen, He, Yifeng, Chen, Hao

arXiv.org Artificial IntelligenceJun-11-2023

Semantic understanding of programs has attracted great attention in the community. Inspired by recent successes of large language models (LLMs) in natural language understanding, tremendous progress has been made by treating programming language as another sort of natural language and training LLMs on corpora of program code. However, programs are essentially different from texts after all, in a sense that they are normally heavily structured and syntax-strict. In particular, programs and their basic units (i.e., functions and subroutines) are designed to demonstrate a variety of behaviors and/or provide possible outputs, given different inputs. The relationship between inputs and possible outputs/behaviors represents the functions/subroutines and profiles the program as a whole. Therefore, we propose to incorporate such a relationship into learning, for achieving a deeper semantic understanding of programs. To obtain inputs that are representative enough to trigger the execution of most part of the code, we resort to fuzz testing and propose fuzz tuning to boost the performance of program understanding and code representation learning, given a pre-trained LLM. The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins. Code is available at https://github.com/rabbitjy/FuzzTuning.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.13592

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Programming in 'natural' language is coming sooner than you think

#artificialintelligenceOct-10-2021, 17:50:26 GMT

Sometimes major shifts happen virtually unnoticed. CodeNet is a follow-up to ImageNet, a large-scale dataset of images and their descriptions; the images are free for non-commercial uses. ImageNet is now central to the progress of deep learning computer vision. CodeNet is an attempt to do for Artificial Intelligence (AI) coding what ImageNet did for computer vision: it is a dataset of over 14 million code samples, covering 50 programming languages, intended to solve 4,000 coding problems. The dataset also contains numerous additional data, such as the amount of memory required for software to run and log outputs of running code.

codenet, imagenet, programming language, (17 more...)

#artificialintelligence

AI-Alerts: 2021 > 2021-10 > AAAI AI-Alert for Oct 12, 2021 (1.00)

Country: North America > Canada > Ontario (0.06)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)

Add feedback

Google and Microsoft are creating a monopoly on coding in plain language

#artificialintelligenceSep-12-2021, 11:55:07 GMT

Sometimes major shifts happen virtually unnoticed. On May 5, IBM announced Project CodeNet to very little media or academic attention. CodeNet is a follow-up to ImageNet, a large-scale dataset of images and their descriptions; the images are free for non-commercial uses. ImageNet is now central to the progress of deep learning computer vision. CodeNet is an attempt to do for Artifical Intelligence (AI) coding what ImageNet did for computer vision: it is a dataset of over 14 million code samples, covering 50 programming languages, intended to solve 4,000 coding problems.

codenet, google and microsoft, microsoft, (17 more...)

#artificialintelligence

Industry: Information Technology (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)

Add feedback

IBM CodeNet: Artificial Intelligence That Can Program Computers And Solve A $100 Billion Legacy Code Problem

#artificialintelligenceJun-5-2021, 00:20:22 GMT

Computer scientists have long toyed with the idea of creating computers that could write programs for other computers. Artificial intelligence is an obvious technology for the task. It has been previously used for programming on a small scale but unfortunately the results have been limited. Artificial intelligence is one of our most powerful and versatile technologies in use today. It can understand and generate speech, analyze documents, recognize images and characters, drive cars, pilot war planes, write papers, and perform thousands of other valuable operations.

codenet, dataset, programming language, (11 more...)

#artificialintelligence

Country:

North America > United States (0.48)
North America > Aruba (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)

Industry:

Semiconductors & Electronics (1.00)
Information Technology (1.00)
Government > Regional Government (0.48)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Software > Programming Languages (0.61)

Add feedback

IBM's Project CodeNet will test how far you can push AI to write software

#artificialintelligenceMay-24-2021, 17:21:53 GMT

IBM's AI research division has released a 14-million-sample dataset to develop machine learning models that can help in programming tasks. Called Project CodeNet, the dataset takes its name after ImageNet, the famous repository of labeled photos that triggered a revolution in computer vision and deep learning. While there's a scant chance that machine learning models built on the CodeNet dataset will make human programmers redundant, there's reason to be hopeful that they will make developers more productive. In the early 2010s, impressive advances in machine learning triggered excitement (and fear) about artificial intelligence soon automating many tasks, including programming. But AI's penetration in software development has been extremely limited.

codenet, dataset, project codenet, (16 more...)

#artificialintelligence

Industry: Information Technology (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Can we teach AI how to code? Welcome to IBM's Project CodeNet

#artificialintelligenceMay-23-2021, 15:30:07 GMT

codenet, dataset, project codenet, (17 more...)

#artificialintelligence

Industry: Information Technology (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

IBM's Project CodeNet will test how far you can push AI to write software

#artificialintelligenceMay-18-2021, 20:09:20 GMT

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. IBM's AI research division has released a 14-million-sample dataset to develop machine learning models that can help in programming tasks. Called Project CodeNet, the dataset takes its name after ImageNet, the famous repository of labeled photos that triggered a revolution in computer vision and deep learning. While there's a scant chance that machine learning models built on the CodeNet dataset will make human programmers redundant, there's reason to be hopeful that they will make developers more productive. In the early 2010s, impressive advances in machine learning triggered excitement (and fear) about artificial intelligence soon automating many tasks, including programming.

codenet, dataset, project codenet, (16 more...)

#artificialintelligence

Industry:

Education > Curriculum > Subject-Specific Education (0.70)
Information Technology (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

IBM's CodeNet dataset can teach AI to translate computer languages

EngadgetMay-11-2021, 02:00:52 GMT

AI and machine learning systems have become increasingly competent in recent years, capable of not just understanding the written word but writing it as well. But while these artificial intelligences have nearly mastered the English language, they have yet to become fluent in the language of computers -- that is, until now. IBM announced during its Think 2021 conference on Monday that its researchers have crafted a Rosetta Stone for programming code. Over the past decade, advancements in AI have mainly been "driven by deep neural networks, and even that, it was driven by three major factors: data with the availability of large data sets for training, innovations in new algorithms, and the massive acceleration of faster and faster compute hardware driven by GPUs," Ruchir Puri, IBM Fellow and Chief Scientist at IBM Research, said during his Think 2021 presentation, likening the new data set to the venerated ImageNet, which has spawned the recent computer vision land rush. "Software is eating the world," Marc Andreessen wrote in 2011.

codenet, dataset, imagenet, (15 more...)

Engadget

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback