println
MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning
Dai, Zhenlong, Yao, Chang, Han, WenKang, Yuan, Ying, Gao, Zhipeng, Chen, Jingyuan
Recent researchers have explored code generation Nowadays, LLMs have been successfully used to task by using LLMs; however, most studies (Li support developers' daily development, such as et al., 2023b, 2022a; Ahmad et al., 2021; Hu et al., code generation, test generation, etc. However, 2021) focus on generating "correct" code. There existing Code LLMs are usually general models is limited research investigating how to generate trained with large programming corpus (Zheng "personalized" code, especially for multi-user personalization, et al., 2023; Chen et al., 2022), therefore the generated with no research conducted yet. Automatically code is difficult to adapt to personalized and/or generating code according to developers' customized requests. Consider the following practical preferences or projects' consistency is a challenging scenarios: Alice is a software developer. To task: (i) Considering different programmers improve programmers' daily efficiency, her company have their own coding styles, it is too expensive provided the base LLMs that can be used for to fine-tune an LLM for each user (Guo et al., code generation.
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Education (0.68)
- Information Technology (0.66)
Coarse-Tuning Models of Code with Reinforcement Learning Feedback
Jain, Abhinav, Adiole, Chima, Chaudhuri, Swarat, Reps, Thomas, Jermaine, Chris
Large Language Models (LLMs) pre-trained on code have recently emerged as the dominant approach to program synthesis. However, these models are trained using next-token prediction, which ignores the syntax and semantics of code. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code. The grounding function uses (i) compiler-derived feedback on whether the code it generates passes a set of correctness checks; and (ii) feedback from a different LLM that compares the generated code to a reference code. RLCF is model- and language-agnostic. We empirically evaluate it on the MBJP and MathQA tasks for Java. Our experiments show that RLCF raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs.
Simple Two-wheel Self-Balancing Robot Implementation
Cyber-physical systems, also known as CPS, is an emerging field of technology that combines the physical and digital worlds by allowing for seamless interaction and communication between the two. One of the key characteristics of a CPS is its ability to take input from its environment and use that information to produce an output through actuators in the physical world. A balancing robot is a prime example of a CPS, as it uses input from its sensors to continually monitor its orientation and take action to prevent falling over by generating thrust through its wheels or manipulating its inertia. In this specific project, a two-wheel self-balancing robot was developed, utilizing the concept of a reverse pendulum. A reverse pendulum by default is inherently unstable and requires an external force to maintain its balance. In this case, the balancing robot produces this external force through the use of wheels and motors. To achieve precise balancing, stepper motors were utilized in the design of the robot. Additionally, the robot has the capability to move in four basic directions and the movement is controlled through an app connected to the robot via Bluetooth. This allows for remote control and monitoring of the robot's movements and actions. Overall, the development of this two-wheel self-balancing robot serves as a demonstration of the potential and capabilities of cyber-physical systems technology.
- North America > United States > District of Columbia > Washington (0.06)
- North America > United States > Texas > Lubbock County > Lubbock (0.04)
- Africa > Sudan (0.04)
Neural Language Models are Effective Plagiarists
Biderman, Stella, Raff, Edward
As artificial intelligence (AI) technologies become increasingly powerful and prominent in society, their misuse is a growing concern. In educational settings, AI technologies could be used by students to cheat on assignments and exams. In this paper we explore whether transformers can be used to solve introductory level programming assignments while bypassing commonly used AI tools to detect plagiarism. We find that a student using GPT-J [Wang and Komatsuzaki, 2021] can complete introductory level programming assignments without triggering suspicion from MOSS [Aiken, 2000], a widely used plagiarism detection tool. This holds despite the fact that GPT-J was not trained on the problems in question and is not provided with any examples to work from. We further find that the code written by GPT-J is diverse in structure, lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code. We conclude with a discussion of the ethical and educational implications of large language models and directions for future research.
- Europe > Netherlands > Limburg > Maastricht (0.04)
- Asia > Indonesia (0.04)
- Overview (0.87)
- Instructional Material > Course Syllabus & Notes (0.67)
- Research Report > New Finding (0.45)
- Health & Medicine (1.00)
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.45)
Scalable Machine Learning on Spark
Here, we're observing the mean and variance of the features we have. This is helpful in determining if we need to perform normalization of features. It's useful to have all features on a similar scale. We are also taking a note of non-zero values, which can adversely impact model performance. Another important metric to analyze is the correlation between features in the input data - Matrix correlMatrix Statistics.corr(inputData.rdd(),
Generative Grading: Neural Approximate Parsing for Automated Student Feedback
Malik, Ali, Wu, Mike, Vasavada, Vrinda, Song, Jinpeng, Mitchell, John, Goodman, Noah, Piech, Chris
Open access to high-quality education is limited by the difficulty of providing student feedback. In this paper, we present Generative Grading with Neural Approximate Parsing (GG-NAP): a novel approach for providing feedback at scale that is capable of both accurately grading student work while also providing verifiability--a property where the model is able to substantiate its claims with a provable certificate. Our approach uses generative descriptions of student cognition, written as probabilistic programs, to synthesise millions of labelled example solutions to a problem; it then trains inference networks to approximately parse real student solutions according to these generative models. We achieve feedback prediction accuracy comparable to professional human experts in a variety of settings: short-answer questions, programs with graphical output, block-based programming, and short Java programs. In a real classroom, we ran an experiment where humans used GG-NAP to grade, yielding doubled grading accuracy while halving grading time.
- Europe > United Kingdom > England (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Research Report > Promising Solution (0.48)
- Overview > Innovation (0.34)
- Education > Educational Setting (0.93)
- Education > Assessment & Standards > Student Performance (0.48)
AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning
Masegosa, Andrés R., Martínez, Ana M., Ramos-López, Darío, Cabañas, Rafael, Salmerón, Antonio, Nielsen, Thomas D., Langseth, Helge, Madsen, Anders L.
The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorithms for either streaming or batch data. These algorithms are based on a flexible variational message passing scheme, which supports discrete and continu- ous variables from a wide range of probability distributions. AMIDST also leverages existing functionality and algorithms by interfacing to software tools such as Flink, Spark, MOA, Weka, R and HUGIN. AMIDST is an open source toolbox written in Java and available at http://www.amidsttoolbox.com under the Apache Software License version 2.0.
- Europe > Denmark > North Jutland > Aalborg (0.05)
- Oceania > New Zealand > North Island > Waikato (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (4 more...)
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
Masegosa, Andres R., Martinez, Ana M., Borchani, Hanen
In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Oceania > Samoa (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (9 more...)
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)