kotlin
Challenge on Optimization of Context Collection for Code Completion
Ustalov, Dmitry, Bogomolov, Egor, Bezzubov, Alexander, Golubev, Yaroslav, Glukhov, Evgeniy, Levtsov, Georgii, Kovalenko, Vladimir
The rapid advancement of workflows and methods for software engineering using AI emphasizes the need for a systematic evaluation and analysis of their ability to leverage information from entire projects, particularly in large code bases. In this challenge on optimization of context collection for code completion, organized by JetBrains in collaboration with Mistral AI as part of the ASE 2025 conference, participants developed efficient mechanisms for collecting context from source code repositories to improve fill-in-the-middle code completions for Python and Kotlin. We constructed a large dataset of real-world code in these two programming languages using permissively licensed open-source projects. The submissions were evaluated based on their ability to maximize completion quality for multiple state-of-the-art neural models using the chrF metric. During the public phase of the competition, nineteen teams submitted solutions to the Python track and eight teams submitted solutions to the Kotlin track. In the private phase, six teams competed, of which five submitted papers to the workshop.
Kotlin ML Pack: Technical Report
Titov, Sergey, Evtikhiev, Mikhail, Shapkin, Anton, Smirnov, Oleg, Boytsov, Sergei, Boytsov, Sergei, Karaeva, Dariia, Sheptyakov, Maksim, Arkhipov, Mikhail, Bryksin, Timofey, Bogomolov, Egor
In this technical report, we present three novel datasets of Kotlin code: KStack, KStack-clean, and KExercises. We also describe the results of fine-tuning CodeLlama and DeepSeek models on this data. Additionally, we present a version of the HumanEval benchmark rewritten by human experts into Kotlin - both the solutions and the tests. Our results demonstrate that small, high-quality datasets (KStack-clean and KExercises) can significantly improve model performance on code generation tasks, achieving up to a 16-point increase in pass rate on the HumanEval benchmark. Lastly, we discuss potential future work in the field of improving language modeling for Kotlin, including the use of static analysis tools in the learning process and the introduction of more intricate and realistic benchmarks.
Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT
Le, Triet H. M., Babar, M. Ali, Thai, Tung Hoang
Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and investigate potential solutions to enhance the performance. Method: We train and test the state-of-the-art model based on CodeBERT with and without data sampling techniques for function-level and line-level SV prediction in three low-resource languages - Kotlin, Swift, and Rust. We also assess the effectiveness of ChatGPT for low-resource SV prediction given its recent success in other domains. Results: Compared to the original work in C/C++ with large data, CodeBERT's performance of function-level and line-level SV prediction significantly declines in low-resource languages, signifying the negative impact of data scarcity. Regarding remediation, data sampling techniques fail to improve CodeBERT; whereas, ChatGPT showcases promising results, substantially enhancing predictive performance by up to 34.4% for the function level and up to 53.5% for the line level. Conclusion: We have highlighted the challenge and made the first promising step for low-resource SV prediction, paving the way for future research in this direction.
Development of a Legal Document AI-Chatbot
Devaraj, Pranav Nataraj, P, Rakesh Teja V, Gangrade, Aaryav, R, Manoj Kumar
With the exponential growth of digital data and the increasing complexity of legal documentation, there is a pressing need for efficient and intelligent tools to streamline the handling of legal documents.With the recent developments in the AI field, especially in chatbots, it cannot be ignored as a very compelling solution to this problem.An insight into the process of creating a Legal Documentation AI Chatbot with as many relevant features as possible within the given time frame is presented.The development of each component of the chatbot is presented in detail.Each component's workings and functionality has been discussed.Starting from the build of the Android app and the Langchain query processing code till the integration of both through a Flask backend and REST API methods.
Multi-lingual Evaluation of Code Generation Models
Athiwaratkun, Ben, Gouda, Sanjay Krishna, Wang, Zijian, Li, Xiaopeng, Tian, Yuchen, Tan, Ming, Ahmad, Wasi Uddin, Wang, Shiqi, Sun, Qing, Shang, Mingyue, Gonugondla, Sujan Kumar, Ding, Hantian, Kumar, Varun, Fulton, Nathan, Farahani, Arash, Jain, Siddhartha, Giaquinto, Robert, Qian, Haifeng, Ramanathan, Murali Krishna, Nallapati, Ramesh, Ray, Baishakhi, Bhatia, Parminder, Sengupta, Sudipta, Roth, Dan, Xiang, Bing
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles prompts and test cases from the original Python datasets into the corresponding data in the target language. Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings. Furthermore, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for other code-related evaluations such as code insertion, robustness, or summarization tasks. Overall, our benchmarks represents a significant step towards a deeper understanding of language models' code generation abilities. We publicly release our code and datasets at https://github.com/amazon-research/mxeval.
FinTech Futures Jobs: Three UK tech jobs with great benefits
Doing great, fulfilling work is important โ after all, our jobs take up a large part of our time each week, so it matters that we're engaged in what we're doing. The research backs it up: 70% of employees say that their sense of purpose is defined by their work. We think a few great benefits help sweeten the deal even more, so we've found some roles below where the work is interesting and fulfilling โ and you'll get some sweet wellness, flexible working, holiday and extra benefits on top too. For even more open roles, you can head on over to our Job Board. About the company: Experian is a global information services company that provides data and analytical tools used to manage credit risk to clients around the world.
Top 10 Programming Languages Recruiters are Looking For in 2022
Post pandemic, AI has become one of the top agendas for businesses as it offers enhanced customer experience, resilience, and reliability. With the advancements in machine learning, data analytics, and conversational AI, companies are finding it feasible and affordable to deploy AI tools that allow them to solve problems and increase efficiency. Here are the 10 most popular programming languages among job seekers. Python can be regarded as the future of programming languages. As per the latest statistics, Python is the main coding language for around 80% of developers.
7 Best programming languages for beginners to learn in 2021
The world is expanding digitally and with every aspect of our lives becoming digital, the demand for computer experts is skyrocketing each day. Therefore, having knowledge of programming languages has become crucial for every IT professional. In fact, programming languages sit at the epicentre of this ever-growing field of Computer Science. If you are a beginner in programming, learning a new language or a new framework is essential. As a fresher in the field of programming language, make sure that you remain steady in both learning and coding.
5 Software Development Trends To Embrace in 2021
In many ways, 2020 feels like a lost year. Remote work and no travel have taken a toll on the best of us. The pandemic has forced a lot of businesses to have an online presence in one way or another. Software development services have never been more important to businesses. This is why it is so important to be in the loop of the current trends taking place in the industry.
Learn how to code in 2021 with training on the 12 most popular programming languages
The more dependent we become on apps, the more demand there'll be for skilled programmers. It just so happens that learning how to code is easier than ever in 2021. In fact, we've rounded up 12 amazing deals on courses and training programs that will teach you the skills you need to start creating your own software, and they're on sale for a limited time! Go, or GoLang, is Google's open-source programming language that's designed to simplify many programming tasks. This course is perfect for beginners, as Go is one of the fastest-growing languages in the industry thanks to its ease of use and familiar syntax.