Goto

Collaborating Authors

 software development


AI and Agile Software Development: From Frustration to Success -- XP2025 Workshop Summary

Herda, Tomas, Pichler, Victoria, Zhang, Zheying, Abrahamsson, Pekka, Hanssen, Geir K.

arXiv.org Artificial Intelligence

The full-day workshop on AI and Agile at XP 2025 convened a diverse group of researchers and industry practitioners to address the practical challenges and opportunities of integrating Artificial Intelligence into Agile software development. Through interactive sessions, participants identified shared frustrations related to integrating AI into Agile Software Development practices, including challenges with tooling, governance, data quality, and critical skill gaps. These challenges were systematically prioritized and analyzed to uncover root causes. The workshop culminated in the collaborative development of a research roadmap that pinpoints actionable directions for future work, including both immediate solutions and ambitious long-term goals. The key outcome is a structured agenda designed to foster joint industry-academic efforts to move from identified frustrations to successful implementation.



Bridging the Skills Gap: A Course Model for Modern Generative AI Education

Bardach, Anya, Murrah, Hamilton

arXiv.org Artificial Intelligence

Research on how the popularization of generative Artificial Intelligence (AI) tools impacts learning environments has led to hesitancy among educators to teach these tools in classrooms, creating two observed disconnects. Generative AI competency is increasingly valued in industry but not in higher education, and students are experimenting with generative AI without formal guidance. The authors argue students across fields must be taught to responsibly and expertly harness the potential of AI tools to ensure job market readiness and positive outcomes. Computer Science trajectories are particularly impacted, and while consistently top ranked U.S. Computer Science departments teach the mechanisms and frameworks underlying AI, few appear to offer courses on applications for existing generative AI tools. A course was developed at a private research university to teach undergraduate and graduate Computer Science students applications for generative AI tools in software development. Two mixed method surveys indicated students overwhelmingly found the course valuable and effective. Co-authored by the instructor and one of the graduate students, this paper explores the context, implementation, and impact of the course through data analysis and reflections from both perspectives. It additionally offers recommendations for replication in and beyond Computer Science departments. This is the extended version of this paper to include technical appendices.


Smarter Together: Creating Agentic Communities of Practice through Shared Experiential Learning

Tablan, Valentin, Taylor, Scott, Hurtado, Gabriel, Bernhem, Kristoffer, Uhrenholt, Anders, Farei, Gabriele, Moilanen, Karo

arXiv.org Artificial Intelligence

The transition from human-centric to agent-centric software development practices is disrupting existing knowledge sharing environments for software developers. Traditional peer-to-peer repositories and developer communities for shared technical knowledge and best practice have witnessed dramatic drops in participation in a short period of time. At the same time, agentic functional equivalents are yet to emerge leaving AI agents, which already generate a significant proportion of all new software code produced, without access to repositories of valuable shared learning. In this paper, we introduce Spark, a novel shared agentic memory architecture which is designed to emulate the collective intelligence and know-how of human developer communities. Spark enables AI coding agents to both contribute to and draw from a persistent and continuously evolving experiential memory. Agents operating in the same general problem space use the Spark shared memory as a repository of new knowledge to achieve collective continual learning. We evaluate Spark as a coach for AI coding agents performing software development tasks. We demonstrate that recommendations made by Spark improve the quality of code generated by generic code generation models at varying sizes and capability tiers. Boosted by Spark, a small open-weights model with 30 billion parameters was able to match the code quality afforded by a much larger state-of-the-art model. Separately, we measure the intrinsic quality of recommendations generated by Spark against a wide range of criteria inspired by software development best practice, and achieve helpfulness levels of up to 98.2% in the top two (out of five) qualitative helpfulness bands.


Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

Ferino, Samuel, Hoda, Rashina, Grundy, John, Treude, Christoph

arXiv.org Artificial Intelligence

Background: Large Language Models emerged with the potential of provoking a revolution in software development (e.g., automating processes, workforce transformation). Although studies have started to investigate the perceived impact of LLMs for software development, there is a need for empirical studies to comprehend how to balance forward and backward effects of using LLMs. Objective: We investigated how LLMs impact software development and how to manage the impact from a software developer's perspective. Method: We conducted 22 interviews with software practitioners across 3 rounds of data collection and analysis, between October (2024) and September (2025). We employed socio-technical grounded theory (STGT) for data analysis to rigorously analyse interview participants' responses. Results: We identified the benefits (e.g., maintain software development flow, improve developers' mental model, and foster entrepreneurship) and disadvantages (e.g., negative impact on developers' personality and damage to developers' reputation) of using LLMs at individual, team, organisation, and society levels; as well as best practices on how to adopt LLMs. Conclusion: Critically, we present the trade-offs that software practitioners, teams, and organisations face in working with LLMs. Our findings are particularly useful for software team leaders and IT managers to assess the viability of LLMs within their specific context.


'Vibe coding' beats 'clanker' to be Collins dictionary's word of the year

The Guardian

Collins dictionary lexicographers chose'vibe coding' after spotting a sharp rise in its usage. Collins dictionary lexicographers chose'vibe coding' after spotting a sharp rise in its usage. 'Vibe coding' beats'clanker' to be Collins dictionary's word of the year AI-inspired word joins'biohacking', 'Henry' and'broligarchy' on tech-heavy 2025 list "Vibe coding", an emerging software development that turns natural language into computer code using artificial intelligence, has been named Collins dictionary's word of the year for 2025. Lexicographers at Collins monitor the 24bn-word Collins Corpus, which draws from a range of media sources, including social media, to create the annual list of new and notable words that reflect our ever-evolving language . They chose vibe coding as word of the year after observing a huge increase in usage since its first appearance in February.


From vibe coding to context engineering: 2025 in software development

MIT Technology Review

This year, we've seen a real-time experiment playing out across the technology industry, one in which AI's software engineering capabilities have been put to the test against human technologists. And although 2025 may have started with AI looking strong, the transition from vibe coding to what's being termed context engineering shows that while the work of human developers is evolving, they nevertheless remain absolutely critical. This is captured in the latest volume of the " Thoughtworks Technology Radar," a report on the technologies used by our teams on projects with clients. In it, we see the emergence of techniques and tooling designed to help teams better tackle the problem of managing context when working with LLMs and AI agents. Taken together, there's a clear signal of the direction of travel in software engineering and even AI more broadly. After years of the industry assuming progress in AI is all about scale and speed, we're starting to see that what matters is the ability to handle context effectively.


EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents

Liu, Junwei, Xu, Chen, Wang, Chong, Bai, Tong, Chen, Weitong, Wong, Kaseng, Lou, Yiling, Peng, Xin

arXiv.org Artificial Intelligence

Recent advances in large language model agents offer the promise of automating end-to-end software development from natural language requirements. However, existing approaches largely adopt linear, waterfall-style pipelines, which oversimplify the iterative nature of real-world development and struggle with complex, large-scale projects. To address these limitations, we propose EvoDev, an iterative software development framework inspired by feature-driven development. EvoDev decomposes user requirements into a set of user-valued features and constructs a Feature Map, a directed acyclic graph that explicitly models dependencies between features. Each node in the feature map maintains multi-level information, including business logic, design, and code, which is propagated along dependencies to provide context for subsequent development iterations. We evaluate EvoDev on challenging Android development tasks and show that it outperforms the best-performing baseline, Claude Code, by a substantial margin of 56.8%, while improving single-agent performance by 16.0%-76.6% across different base LLMs, highlighting the importance of dependency modeling, context propagation, and workflow-aware agent design for complex software projects. Our work summarizes practical insights for designing iterative, LLM-driven development frameworks and informs future training of base LLMs to better support iterative software development.


E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task

Liu, Jingyao, Huang, Chen, Guan, Zhizhao, Lei, Wenqiang, Deng, Yang

arXiv.org Artificial Intelligence

The rapid advancement in large language models (LLMs) has demonstrated significant potential in End-to-End Software Development (E2ESD). However, existing E2ESD benchmarks are limited by coarse-grained requirement specifications and unreliable evaluation protocols, hindering a true understanding of current framework capabilities. To address these limitations, we present E2EDev, a novel benchmark grounded in the principles of Behavior-Driven Development (BDD), which evaluates the capabilities of E2ESD frameworks by assessing whether the generated software meets user needs through mimicking real user interactions (Figure 1). E2EDev comprises (i) a fine-grained set of user requirements, (ii) multiple BDD test scenarios with corresponding Python step implementations for each requirement, and (iii) a fully automated testing pipeline built on the Behave framework. To ensure its quality while reducing the annotation effort, E2EDev leverages our proposed Human-in-the-Loop Multi-Agent Annotation Framework (HITL-MAA). By evaluating various E2ESD frameworks and LLM backbones with E2EDev, our analysis reveals a persistent struggle to effectively solve these tasks, underscoring the critical need for more effective and cost-efficient E2ESD solutions. Our codebase and benchmark are publicly available at https://github.com/SCUNLP/E2EDev.


Toward Agentic Software Engineering Beyond Code: Framing Vision, Values, and Vocabulary

Hoda, Rashina

arXiv.org Artificial Intelligence

Agentic AI is poised to usher in a seismic paradigm shift in Software Engineering (SE). As technologists rush head-along to make agentic AI a reality, SE researchers are driven to establish agentic SE as a research area. While early visions of agentic SE are primarily focused on code-related activities, early empirical evidence calls for a consideration of a range of socio-technical concerns to make it work in practice. This paper contributes to the emerging community vision by: (a) recommending an expansion of its scope beyond code, toward a 'whole of process' vision, grounding it in SE foundations and evolution and emerging agentic SE frameworks, (b) proposing a preliminary set of values and principles to guide efforts, and (c) sharing guidance on designing/using well-defined vocabulary for agentic SE. It is hoped that these ideas will encourage community collaborations and steer the SE community towards laying strong foundations of agentic SE so its not only inevitable but also deliberate and desirable in the long run.