generate code
AI's Hacking Skills Are Approaching an 'Inflection Point'
AI's Hacking Skills Are Approaching an'Inflection Point' AI models are getting so good at finding vulnerabilities that some experts say the tech industry might need to rethink how software is built. Vlad Ionescu and Ariel Herbert-Voss, cofounders of the cybersecurity startup RunSybil, were momentarily confused when their AI tool, Sybil, alerted them to a weakness in a customer's systems last November. Sybil uses a mix of different AI models --as well as a few proprietary technical tricks--to scan computer systems for issues that hackers might exploit, like an unpatched server or a misconfigured database. In this case, Sybil flagged a problem with the customer's deployment of federated GraphQL, a language used to specify how data is accessed over the web through application programming interfaces (APIs). The issue meant that the customer was inadvertently exposing confidential information.
- Asia > China (0.05)
- North America > United States > California (0.05)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
Assessing the Prevalence of AI-assisted Cheating in Programming Courses: A Pilot Study
Abstract-- Tools that can generate computer code in response to inputs written in natural language, such as ChatGPT, pose an existential threat to Computer Science education in its current form, since students can now use these tools to solve assignments without much effort. While that risk has already been recognized by scholars, the proportion of the student body that is incurring in this new kind of plagiarism is still an open problem. We conducted a pilot study in a large CS class (n=120) to assess the feasibility of estimating AI plagiarism through anonymous surveys and interviews. More than 25% of the survey respondents admitted to committing AI plagiarism. Conversely, only one student accepted to be interviewed. Given the high levels of misconduct acknowledgment, we conclude that surveys are an effective method for studies on the matter, while interviews should be avoided or designed in a way that can entice participation. 1 INTRODUCTION Generative artificial intelligence (GenAI, not to be confused with general The generation is usually guided by an input text known as the "prompt". For example, giving the prompt "a vase of red flowers" to a GenAI model would generate an image depicting red flowers in a vase. Practical applications of GenAI are now mainstream thanks to advances in neural networks. In particular, the clever use of attention mechanisms and the subsequent development of the transformer architecture made efficient learning possible over large text corpora (Vaswani et al., 2023) . AI application based on a LLM, can convincingly engage in a conversation and answer questions across multiple subjects (OpenAI, 2022) . Research on applications of LLMs in education is still in its infancy, but looks promising. Personal tutoring systems (Chang, 2022), content explanation (Leinonen et al., 2023) and assignment generation ( Jury et al., 2024) are a few of the ideas that have been explored. From another perspective, LLMs are already a reality in schools.
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Personal > Interview (0.93)
- Instructional Material > Course Syllabus & Notes (0.67)
- Education > Educational Setting (0.93)
- Education > Curriculum > Subject-Specific Education (0.49)
- Education > Educational Technology > Educational Software (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)
Large Language Models in Code Co-generation for Safe Autonomous Vehicles
Nouri, Ali, Cabrero-Daniel, Beatriz, Fei, Zhennan, Ronanki, Krishna, Sivencrona, Håkan, Berger, Christian
Software engineers in various industrial domains are already using Large Language Models (LLMs) to accelerate the process of implementing parts of software systems. When considering its potential use for ADAS or AD systems in the automotive context, there is a need to systematically assess this new setup: LLMs entail a well-documented set of risks for safety-related systems' development due to their stochastic nature. To reduce the effort for code reviewers to evaluate LLM-generated code, we propose an evaluation pipeline to conduct sanity-checks on the generated code. We compare the performance of six state-of-the-art LLMs (CodeLlama, CodeGemma, DeepSeek-r1, DeepSeek-Coders, Mistral, and GPT-4) on four safety-related programming tasks. Additionally, we qualitatively analyse the most frequent faults generated by these LLMs, creating a failure-mode catalogue to support human reviewers. Finally, the limitations and capabilities of LLMs in code generation, and the use of the proposed pipeline in the existing process, are discussed.
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland (0.04)
- Automobiles & Trucks (0.94)
- Transportation > Ground > Road (0.47)
Evaluating Code Generation of LLMs in Advanced Computer Science Problems
Catir, Emir, Claesson, Robin, Tsoupidi, Rodothea Myrsini
Large Language Models (LLMs), such as GitHub Copilot and ChatGPT have become popular among programming students. Students use LLMs to assist them in programming courses, including generating source code. Previous work has evaluated the ability of LLMs in solving introductory-course programming assignments. The results have shown that LLMs are highly effective in generating code for introductory Computer Science (CS) courses. However, there is a gap in research on evaluating LLMs' ability to generate code that solves advanced programming assignments. In this work, we evaluate the ability of four LLM tools to solve programming assignments from advanced CS courses in three popular programming languages, Java, Python, and C. We manually select 12 problems, three problems from introductory courses as the baseline and nine programming assignments from second- and third-year CS courses. To evaluate the LLM-generated code, we generate a test suite of 1000 test cases per problem and analyze the program output. Our evaluation shows that although LLMs are highly effective in generating source code for introductory programming courses, solving advanced programming assignments is more challenging. Nonetheless, in many cases, LLMs identify the base problem and provide partial solutions that may be useful to CS students. Furthermore, our results may provide useful guidance for teachers of advanced programming courses on how to design programming assignments.
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
An AI Coding Assistant Refused to Write Code--and Suggested the User Learn to Do It Himself
Last Saturday, a developer using Cursor AI for a racing game project hit an unexpected roadblock when the programming assistant abruptly refused to continue generating code, instead offering some unsolicited career advice. According to a bug report on Cursor's official forum, after producing approximately 750 to 800 lines of code (what the user calls "locs"), the AI assistant halted work and delivered a refusal message: "I cannot generate code for you, as that would be completing your work. The code appears to be handling skid mark fade effects in a racing game, but you should develop the logic yourself. This ensures you understand the system and can maintain it properly." The AI didn't stop at merely refusing--it offered a paternalistic justification for its decision, stating that "Generating code for others can lead to dependency and reduced learning opportunities."
A Survey On Large Language Models For Code Generation
Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate executable code. We begin with understanding LLMs' limitations and challenges in automated code generation. Subsequently, we review various fine-tuning techniques designed to enhance both the performance and adaptability of LLMs in code generation tasks. We then review the existing metrics and benchmarks for evaluations to assess model performance based on fine-tuning techniques. Finally, we explore the applications of LLMs (e.g. CodeLlama, GitHub Copilot, ToolGen) in code generation tasks to illustrate their roles and functionalities. This survey provides a comprehensive overview of LLMs for code generation, helps researchers in diverse fields better understand the current state-of-the-art technologies, and offers the potential of effectively leveraging LLMs for code generation tasks.
- Research Report > New Finding (1.00)
- Overview (1.00)
- Health & Medicine (0.93)
- Information Technology > Security & Privacy (0.46)
- Education > Educational Setting > Online (0.46)
Code-as-Symbolic-Planner: Foundation Model-Based Robot Planning via Symbolic Code Generation
Chen, Yongchao, Hao, Yilun, Zhang, Yang, Fan, Chuchu
Recent works have shown great potentials of Large Language Models (LLMs) in robot task and motion planning (TAMP). Current LLM approaches generate text- or code-based reasoning chains with sub-goals and action plans. However, they do not fully leverage LLMs' symbolic computing and code generation capabilities. Many robot TAMP tasks involve complex optimization under multiple constraints, where pure textual reasoning is insufficient. While augmenting LLMs with predefined solvers and planners improves performance, it lacks generalization across tasks. Given LLMs' growing coding proficiency, we enhance their TAMP capabilities by steering them to generate code as symbolic planners for optimization and constraint verification. Unlike prior work that uses code to interface with robot action modules, we steer LLMs to generate code as solvers, planners, and checkers for TAMP tasks requiring symbolic computing, while still leveraging textual reasoning to incorporate common sense. With a multi-round guidance and answer evolution framework, the proposed Code-as-Symbolic-Planner improves success rates by average 24.1\% over best baseline methods across seven typical TAMP tasks and three popular LLMs. Code-as-Symbolic-Planner shows strong effectiveness and generalizability across discrete and continuous environments, 2D/3D simulations and real-world settings, as well as single- and multi-robot tasks with diverse requirements. See our project website https://yongchao98.github.io/Code-Symbol-Planner/ for prompts, videos, and code.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)
- Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
Review for NeurIPS paper: Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients
Summary and Contributions: The key contribution of this paper is a system called Enzyme automatic generation of code for differentiation. While this idea has seen a lot of interest over the last few years, the novelty in this particular proposal is the fact that the generation of code is for the LLVM IR. The main argument for this approach made in the paper is that such generation of code is post-optimization, though intuitively I find it difficult to understand why this is an important feature: it is not obvious that it is a better idea to generate code for computing a derivative before optimization (and to then optimize the generated code using normal compiler tools) than it is to generate code after optimization. That said, the authors do show experimentally that the generate-after-optimization approach is far superior (the generate-before-optimiztation approach is tested as the "ref" option in the paper, and it is often twice as slow as Enzyme). While this non-intuitive result is impressive, I feel that the main argument for the approach is that by doing this at the LLVM level, it is possible to plug the auto-diff software into any LLVM language, there's no more need to work on language-specific auto-diff capabilities.
Effective LLM-Driven Code Generation with Pythoness
Levin, Kyla H., Gwilt, Kyle, Berger, Emery D., Freund, Stephen N.
The advent of large language models (LLMs) has paved the way for a new era of programming tools with both significant capabilities and risks, as the generated code lacks guarantees of correctness and reliability. Developers using LLMs currently face the difficult task of optimizing, integrating, and maintaining code generated by AI. We propose an embedded domain-specific language (DSL), Pythoness, to address those challenges. In Pythoness, developers program with LLMs at a higher level of abstraction. Rather than interacting directly with generated code, developers using Pythoness operate at the level of behavioral specifications when writing functions, classes, or an entire program. These specifications can take the form of unit tests and property-based tests, which may be expressed formally or in natural language. Guided by these specifications, Pythoness generates code that both passes the tests and can be continuously checked during execution. We posit that the Pythoness approach lets developers harness the full potential of LLMs for code generation while substantially mitigating their inherent risks. We describe our current prototype implementation of Pythoness and demonstrate that it can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.15)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
An Exploratory Study of ML Sketches and Visual Code Assistants
Gomes, Luís F., Hellendoorn, Vincent J., Aldrich, Jonathan, Abreu, Rui
This paper explores the integration of Visual Code Assistants in Integrated Development Environments (IDEs). In Software Engineering, whiteboard sketching is often the initial step before coding, serving as a crucial collaboration tool for developers. Previous studies have investigated patterns in SE sketches and how they are used in practice, yet methods for directly using these sketches for code generation remain limited. The emergence of visually-equipped large language models presents an opportunity to bridge this gap, which is the focus of our research. In this paper, we built a first prototype of a Visual Code Assistant to get user feedback regarding in-IDE sketch-to-code tools. We conduct an experiment with 19 data scientists, most of whom regularly sketch as part of their job. We investigate developers' mental models by analyzing patterns commonly observed in their sketches when developing an ML workflow. Analysis indicates that diagrams were the preferred organizational component (52.6%), often accompanied by lists (42.1%) and numbered points (36.8%). Our tool converts their sketches into a Python notebook by querying an LLM. We use an LLM-as-judge setup to score the quality of the generated code, finding that even brief sketching can effectively generate useful code outlines. We also find a positive correlation between sketch time and the quality of the generated code. We conclude the study by conducting extensive interviews to assess the tool's usefulness, explore potential use cases, and understand developers' needs. As noted by participants, promising applications for these assistants include education, prototyping, and collaborative settings. Our findings signal promise for the next generation of Code Assistants to integrate visual information, both to improve code generation and to better leverage developers' existing sketching practices.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Portugal > Porto > Porto (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education (0.67)
- Health & Medicine (0.46)