TriagerX: Dual Transformers for Bug Triaging Tasks with Content and Interaction Based Rankings

Mamun, Md Afif Al, Uddin, Gias, Xia, Lan, Zhang, Longyu

arXiv.org Artificial Intelligence 

--Pretrained Language Models or PLMs are transformer-based architectures that can be used in bug triaging tasks. PLMs can better capture token semantics than traditional Machine Learning (ML) models that rely on statistical features (e.g., TF-IDF, bag of words). However, PLMs may still attend to less relevant tokens in a bug report, which can impact their effectiveness. In addition, the model can be sub-optimal with its recommendations when the interaction history of developers around similar bugs is not taken into account. We designed TriagerX to address these limitations. First, to assess token semantics more reliably, we leverage a dual-transformer architecture. Unlike current state-of-the-art (SOT A) baselines that employ a single transformer architecture, TriagerX collects recommendations from two transformers with each offering recommendations via its last three layers. This setup generates a robust content-based ranking of candidate developers. TriagerX then refines this ranking by employing a novel interaction-based ranking methodology, which considers developers' historical interactions with similar fixed bugs. We worked with our large industry partner to successfully deploy TriagerX in their development environment. The partner required both developer and component recommendations, with components acting as proxies for team assignments--particularly useful in cases of developer turnover or team changes. We trained TriagerX on the partner's dataset for both tasks, and it outperformed SOT A baselines by up to 10% for component recommendations and 54% for developer recommendations. Bug triaging involves assigning reported issues to the most suitable developer or software team for resolution. Over the past few decades, various information retrieval (IR), machine learning (ML), and deep learning (DL) approaches automated this process [1-7]. However, their real-world adoption remains limited due to inconsistent performance across different datasets and industrial settings [8]. To understand and address these challenges, in collaboration with our industrial partner (IBM), we examined the limitations of existing approaches and then designed a novel bug triaging technique called TriagerX. We have successfully deployed TriagerX within the partner's development environment.