Neural Network Based Classification of Self-admitted Technical Debt: From Performance to Explainability and DeployabilityJ1
Technical debt is a metaphor to reflect the tradeoff software engineers make between short term benefits and long term stability. Self-admitted technical debt (SATD) [2], a variant of technical debt, has been proposed to identify debt that is intentionally introduced during software development, e.g., temporary fixes and workarounds. Previous studies have leveraged human-summarized patterns (which represent n-gram phrases that can be used to identify SATD) or text mining techniques to detect SATD in source code comments. However, several characteristics of SATD features in code comments, such as vocabulary diversity, project uniqueness, length and semantic variations, pose a big challenge to the accuracy of pattern or traditional text-mining based SATD detection, especially for cross-project deployment [1]. Further more, although traditional text-mining based method outperforms pattern-based method in prediction accuracy, the text features it uses are less intuitive than human-summarized patterns, which makes the prediction results hard to explain. In this paper, we first identify five characteristics of SATD comments that affect the performance, generalizability and adaptability of pattern-based SATD detection [2] and traditional text-mining based SATD classification [1]. To improve the accuracy of SATD prediction, especially for cross-project prediction, we propose a Convolutional Neural Network (CNN)-based approach for classifying code comments as SATD or non-SATD. To improve the explainability of our model’s prediction results, we exploit the computational structure of CNNs to identify key phrases and patterns in code comments that are most relevant to SATD.
The main contributions of this paper are: (1) We present a novel CNN-based approach to identify SATDs from source code comments, which is an imbalanced dataset. Our approach achieves a substantial improvement over text mining approaches in both within- and cross-project settings; (2) We have designed a backtracking method to extract and highlight key phrases and SATD patterns in the code comments, which can then be used to explain the SATD classification results by the CNN model; (3) We have conducted extensive experiments to evaluate not only the performance of our approach, but also its generalizability and adaptability, as well as the intuitiveness and explainability of the CNN-learned SATD features and patterns.
Wed 8 JulDisplayed time zone: (UTC) Coordinated Universal Time change
01:05 - 02:05 | P11-Natural Language ArtifactsJournal First / Technical Papers / Paper Presentations at Goguryeo Chair(s): Jane Cleland-Huang University of Notre Dame | ||
01:05 8mTalk | Neural Network Based Classification of Self-admitted Technical Debt: From Performance to Explainability and DeployabilityJ1 Journal First Xiaoxue Ren Zhejiang University, Zhenchang Xing Australia National University, Xin Xia Monash University, David Lo Singapore Management University, Xinyu Wang Zhejiang University, John Grundy Monash University | ||
01:13 8mTalk | Domain-specific Machine Translation with Recurrent Neural Network for Software LocalizationJ1 Journal First Xu Wang College of Engineering & Computer ScienceAustralian National University, Canberra, Australia, Chunyang Chen Monash University, Zhenchang Xing Australia National University | ||
01:21 12mTalk | Mitigating Turnover with Code Review Recommendation: Balancing Expertise, Workload, and Knowledge DistributionTechnical Technical Papers |