Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Wed 8 Jul 2020 01:05 - 01:13 at Goguryeo - P11-Natural Language Artifacts Chair(s): Jane Cleland-Huang

Technical debt is a metaphor to reflect the tradeoff software engineers make between short term benefits and long term stability. Self-admitted technical debt (SATD) [2], a variant of technical debt, has been proposed to identify debt that is intentionally introduced during software development, e.g., temporary fixes and workarounds. Previous studies have leveraged human-summarized patterns (which represent n-gram phrases that can be used to identify SATD) or text mining techniques to detect SATD in source code comments. However, several characteristics of SATD features in code comments, such as vocabulary diversity, project uniqueness, length and semantic variations, pose a big challenge to the accuracy of pattern or traditional text-mining based SATD detection, especially for cross-project deployment [1]. Further more, although traditional text-mining based method outperforms pattern-based method in prediction accuracy, the text features it uses are less intuitive than human-summarized patterns, which makes the prediction results hard to explain. In this paper, we first identify five characteristics of SATD comments that affect the performance, generalizability and adaptability of pattern-based SATD detection [2] and traditional text-mining based SATD classification [1]. To improve the accuracy of SATD prediction, especially for cross-project prediction, we propose a Convolutional Neural Network (CNN)-based approach for classifying code comments as SATD or non-SATD. To improve the explainability of our model’s prediction results, we exploit the computational structure of CNNs to identify key phrases and patterns in code comments that are most relevant to SATD.

The main contributions of this paper are: (1) We present a novel CNN-based approach to identify SATDs from source code comments, which is an imbalanced dataset. Our approach achieves a substantial improvement over text mining approaches in both within- and cross-project settings; (2) We have designed a backtracking method to extract and highlight key phrases and SATD patterns in the code comments, which can then be used to explain the SATD classification results by the CNN model; (3) We have conducted extensive experiments to evaluate not only the performance of our approach, but also its generalizability and adaptability, as well as the intuitiveness and explainability of the CNN-learned SATD features and patterns.

Wed 8 Jul

Displayed time zone: (UTC) Coordinated Universal Time change

01:05 - 02:05
P11-Natural Language ArtifactsJournal First / Technical Papers at Goguryeo
Chair(s): Jane Cleland-Huang University of Notre Dame
01:05
8m
Talk
Neural Network Based Classification of Self-admitted Technical Debt: From Performance to Explainability and DeployabilityJ1
Journal First
Xiaoxue Ren Zhejiang University, Zhenchang Xing Australia National University, Xin Xia Monash University, David Lo Singapore Management University, Xinyu Wang Zhejiang University, John Grundy Monash University
01:13
8m
Talk
Domain-specific Machine Translation with Recurrent Neural Network for Software LocalizationJ1
Journal First
Xu Wang College of Engineering & Computer ScienceAustralian National University, Canberra, Australia, Chunyang Chen Monash University, Zhenchang Xing Australia National University
01:21
12m
Talk
Mitigating Turnover with Code Review Recommendation: Balancing Expertise, Workload, and Knowledge DistributionTechnicalArtifact Available
Technical Papers
Ehsan Mirsaeedi Concordia University, Peter Rigby Concordia University, Montreal, Canada