AITopics | Rovaniemi

Large Language Models (LLMs) are starting to be profiled as one of the most significant disruptions in the Software Testing field. Specifically, they have been successfully applied in software testing tasks such as generating test code, or summarizing documentation. This potential has attracted hundreds of researchers, resulting in dozens of new contributions every month, hardening researchers to stay at the forefront of the wave. Still, to the best of our knowledge, no prior work has provided a structured vision of the progress and most relevant research trends in LLM-based testing. In this article, we aim to provide a roadmap that illustrates its current state, grouping the contributions into different categories, and also sketching the most promising and active research directions for the field. To achieve this objective, we have conducted a semi-systematic literature review, collecting articles and mapping them into the most prominent categories, reviewing the current and ongoing status, and analyzing the open challenges of LLM-based software testing. Lastly, we have outlined several expected long-term impacts of LLMs over the whole software testing field.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25043

Country:

Oceania > Australia > Victoria > Melbourne (0.28)
North America > United States > California > Sacramento County > Sacramento (0.14)
Europe > Austria > Vienna (0.14)
(51 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated Privacy Information Annotation in Large Language Model Interactions

Zeng, Hang, Liu, Xiangyu, Hu, Yong, Niu, Chaoyue, Wu, Fan, Tang, Shaojie, Chen, Guihai

arXiv.org Artificial IntelligenceAug-11-2025

Users interacting with large language models (LLMs) under their real identifiers often unknowingly risk disclosing private information. Automatically notifying users whether their queries leak privacy and which phrases leak what private information has therefore become a practical need. Existing privacy detection methods, however, were designed for different objectives and application domains, typically tagging personally identifiable information (PII) in anonymous content, which is insufficient in real-name interaction scenarios with LLMs. In this work, to support the development and evaluation of privacy detection models for LLM interactions that are deployable on local user devices, we construct a large-scale multilingual dataset with 249K user queries and 154K annotated privacy phrases. In particular, we build an automated privacy annotation pipeline with strong LLMs to automatically extract privacy phrases from dialogue datasets and annotate leaked information. We also design evaluation metrics at the levels of privacy leakage, extracted privacy phrase, and privacy information. We further establish baseline methods using light-weight LLMs with both tuning-free and tuning-based methods, and report a comprehensive evaluation of their performance. Evaluation results reveal a gap between current performance and the requirements of real-world LLM applications, motivating future research into more effective local privacy detection methods grounded in our dataset.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.2091

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Shanghai > Shanghai (0.05)
(25 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants

Štorek, Adam, Gupta, Mukur, Bhatt, Noopur, Gupta, Aditya, Kim, Janie, Srivastava, Prashast, Jana, Suman

arXiv.org Artificial IntelligenceMar-18-2025

AI coding assistants are widely used for tasks like code generation, bug detection, and comprehension. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to compromise the assistant's outputs, potentially generating vulnerable code, overlooking flaws, or introducing critical errors. We propose a novel attack, Cross-Origin Context Poisoning (XOXO), that is particularly challenging to detect as it relies on adversarial code modifications that are semantically equivalent. Traditional program analysis techniques struggle to identify these correlations since the semantics of the code remain correct, making it appear legitimate. This allows attackers to manipulate code assistants into producing incorrect outputs, including vulnerabilities or backdoors, while shifting the blame to the victim developer or tester. We introduce a novel, task-agnostic black-box attack algorithm GCGS that systematically searches the transformation space using a Cayley Graph, achieving an 83.09% attack success rate on average across five tasks and eleven models, including GPT-4o and Claude 3.5 Sonnet v2 used by many popular AI coding assistants. Furthermore, existing defenses, including adversarial fine-tuning, are ineffective against our attack, underscoring the need for new security measures in LLM-powered coding tools.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.14281

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

Sun, Weisong, Zhang, Yiran, Zhu, Jie, Wang, Zhihui, Fang, Chunrong, Zhang, Yonglong, Feng, Yebo, Huang, Jiangping, Wang, Xingya, Jin, Zhi, Liu, Yang

arXiv.org Artificial IntelligenceMar-13-2025

Commenting code is a crucial activity in software development, as it aids in facilitating future maintenance and updates. To enhance the efficiency of writing comments and reduce developers' workload, researchers has proposed various automated code summarization (ACS) techniques to automatically generate comments/summaries for given code units. However, these ACS techniques primarily focus on generating summaries for code units at the method level. There is a significant lack of research on summarizing higher-level code units, such as file-level and module-level code units, despite the fact that summaries of these higher-level code units are highly useful for quickly gaining a macro-level understanding of software components and architecture. To fill this gap, in this paper, we conduct a systematic study on how to use LLMs for commenting higher-level code units, including file level and module level. These higher-level units are significantly larger than method-level ones, which poses challenges in handling long code inputs within LLM constraints and maintaining efficiency. To address these issues, we explore various summarization strategies for ACS of higher-level code units, which can be divided into three types: full code summarization, reduced code summarization, and hierarchical code summarization. The experimental results suggest that for summarizing file-level code units, using the full code is the most effective approach, with reduced code serving as a cost-efficient alternative. However, for summarizing module-level code units, hierarchical code summarization becomes the most promising strategy. In addition, inspired by the research on method-level ACS, we also investigate using the LLM as an evaluator to evaluate the quality of summaries of higher-level code units. The experimental results demonstrate that the LLM's evaluation results strongly correlate with human evaluations.

code unit, module, summarization, (14 more...)

arXiv.org Artificial Intelligence

2503.10737

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(17 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
Health & Medicine > Therapeutic Area > Immunology (0.34)
Health & Medicine > Epidemiology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs

Cao, Jialun, Chan, Yuk-Kit, Ling, Zixuan, Wang, Wenxuan, Li, Shuqing, Liu, Mingwei, Qiao, Ruixi, Han, Yuting, Wang, Chaozheng, Yu, Boxi, He, Pinjia, Wang, Shuai, Zheng, Zibin, Lyu, Michael R., Cheung, Shing-Chi

arXiv.org Artificial IntelligenceFeb-17-2025

Various benchmarks have been proposed to assess the performance of large language models (LLMs) in different coding scenarios. We refer to them as code-related benchmarks. However, there are no systematic guidelines by which such a benchmark should be developed to ensure its quality, reliability, and reproducibility. We propose How2Bench, which is comprised of a 55-criteria checklist as a set of guidelines to govern the development of code-related benchmarks comprehensively. Using HOW2BENCH, we profiled 274 benchmarks released within the past decade and found concerning issues. Nearly 70% of the benchmarks did not take measures for data quality assurance; over 10% did not even open source or only partially open source. Many highly cited benchmarks have loopholes, including duplicated samples, incorrect reference codes/tests/prompts, and unremoved sensitive/confidential information. Finally, we conducted a human study involving 49 participants, which revealed significant gaps in awareness of the importance of data quality, reproducibility, and transparency.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.10711

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
Europe > Austria > Vienna (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(47 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.45)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GS-GVINS: A Tightly-integrated GNSS-Visual-Inertial Navigation System Augmented by 3D Gaussian Splatting

Zhou, Zelin, Uprety, Saurav, Nie, Shichuang, Yang, Hongzhou

arXiv.org Artificial IntelligenceFeb-15-2025

Recently, the emergence of 3D Gaussian Splatting (3DGS) has drawn significant attention in the area of 3D map reconstruction and visual SLAM. While extensive research has explored 3DGS for indoor trajectory tracking using visual sensor alone or in combination with Light Detection and Ranging (LiDAR) and Inertial Measurement Unit (IMU), its integration with GNSS for large-scale outdoor navigation remains underexplored. To address these concerns, we proposed GS-GVINS: a tightly-integrated GNSS-Visual-Inertial Navigation System augmented by 3DGS. This system leverages 3D Gaussian as a continuous differentiable scene representation in largescale outdoor environments, enhancing navigation performance through the constructed 3D Gaussian map. Notably, GS-GVINS is the first GNSS-Visual-Inertial navigation application that directly utilizes the analytical jacobians of SE3 camera pose with respect to 3D Gaussians. To maintain the quality of 3DGS rendering in extreme dynamic states, we introduce a motionaware 3D Gaussian pruning mechanism, updating the map based on relative pose translation and the accumulated opacity along the camera ray. For validation, we test our system under different driving environments: open-sky, sub-urban, and urban. Both self-collected and public datasets are used for evaluation. The results demonstrate the effectiveness of GS-GVINS in enhancing navigation accuracy across diverse driving environments.

artificial intelligence, gaussian, gaussian map, (17 more...)

arXiv.org Artificial Intelligence

2502.10975

Country:

North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
Europe > Finland > Lapland > Rovaniemi (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Transportation > Air (0.35)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction

McGlinchey, Andrea Cristina, Barclay, Peter J

arXiv.org Artificial IntelligenceDec-15-2024

Following the universal availability of generative AI systems with the release of ChatGPT, automatic detection of deceptive text created by Large Language Models has focused on domains such as academic plagiarism and "fake news". However, generative AI also poses a threat to the livelihood of creative writers, and perhaps to literary culture in general, through reduction in quality of published material. Training a Large Language Model on writers' output to generate "sham books" in a particular style seems to constitute a new form of plagiarism. This problem has been little researched. In this study, we trained Machine Learning classifier models to distinguish short samples of human-written from machine-generated creative fiction, focusing on classic detective novels. Our results show that a Naive Bayes and a Multi-Layer Perceptron classifier achieved a high degree of success (accuracy > 95%), significantly outperforming human judges (accuracy < 55%). This approach worked well with short text samples (around 100 words), which previous research has shown to be difficult to classify. We have deployed an online proof-of-concept classifier tool, AI Detective, as a first step towards developing lightweight and reliable applications for use by editors and publishers, with the aim of protecting the economic and cultural contribution of human authors.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.15253

Country:

Europe > United Kingdom > Scotland (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)
Europe > Finland > Lapland > Rovaniemi (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Media > News (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.57)

Add feedback

CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?

Zhao, Yuwei, Luo, Ziyang, Tian, Yuchen, Lin, Hongzhan, Yan, Weixiang, Li, Annan, Ma, Jing

arXiv.org Artificial IntelligenceSep-13-2024

Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks. However, these benchmarks may not fully capture a model's code understanding abilities. We introduce CodeJudge-Eval (CJ-Eval), a novel benchmark designed to assess LLMs' code understanding abilities from the perspective of code judging rather than code generation. CJ-Eval challenges models to determine the correctness of provided code solutions, encompassing various error types and compilation issues. By leveraging a diverse set of problems and a fine-grained judging system, CJ-Eval addresses the limitations of traditional benchmarks, including the potential memorization of solutions. Evaluation of 12 well-known LLMs on CJ-Eval reveals that even state-of-the-art models struggle, highlighting the benchmark's ability to probe deeper into models' code understanding abilities. Our codes and benchmark are available at \url{https://github.com/CodeLLM-Research/CodeJudge-Eval}.

benchmark, solution code, test case, (16 more...)

arXiv.org Artificial Intelligence

2408.10718

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(10 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback