Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Wed 8 Jul 2020 01:05 - 01:13 at Goguryeo - P11-Natural Language Artifacts Chair(s): Jane Cleland-Huang

Technical debt is a metaphor to reflect the tradeoff software engineers make between short term benefits and long term stability. Self-admitted technical debt (SATD) [2], a variant of technical debt, has been proposed to identify debt that is intentionally introduced during software development, e.g., temporary fixes and workarounds. Previous studies have leveraged human-summarized patterns (which represent n-gram phrases that can be used to identify SATD) or text mining techniques to detect SATD in source code comments. However, several characteristics of SATD features in code comments, such as vocabulary diversity, project uniqueness, length and semantic variations, pose a big challenge to the accuracy of pattern or traditional text-mining based SATD detection, especially for cross-project deployment [1]. Further more, although traditional text-mining based method outperforms pattern-based method in prediction accuracy, the text features it uses are less intuitive than human-summarized patterns, which makes the prediction results hard to explain. In this paper, we first identify five characteristics of SATD comments that affect the performance, generalizability and adaptability of pattern-based SATD detection [2] and traditional text-mining based SATD classification [1]. To improve the accuracy of SATD prediction, especially for cross-project prediction, we propose a Convolutional Neural Network (CNN)-based approach for classifying code comments as SATD or non-SATD. To improve the explainability of our model’s prediction results, we exploit the computational structure of CNNs to identify key phrases and patterns in code comments that are most relevant to SATD.

The main contributions of this paper are: (1) We present a novel CNN-based approach to identify SATDs from source code comments, which is an imbalanced dataset. Our approach achieves a substantial improvement over text mining approaches in both within- and cross-project settings; (2) We have designed a backtracking method to extract and highlight key phrases and SATD patterns in the code comments, which can then be used to explain the SATD classification results by the CNN model; (3) We have conducted extensive experiments to evaluate not only the performance of our approach, but also its generalizability and adaptability, as well as the intuitiveness and explainability of the CNN-learned SATD features and patterns.

Wed 8 Jul
Times are displayed in time zone: (UTC) Coordinated Universal Time change

icse-2020-paper-presentations
01:05 - 02:05: Paper Presentations - P11-Natural Language Artifacts at Goguryeo
Chair(s): Jane Cleland-HuangUniversity of Notre Dame
icse-2020-Journal-First01:05 - 01:13
Talk
Xiaoxue RenZhejiang University, Zhenchang XingAustralia National University, Xin XiaMonash University, David LoSingapore Management University, Xinyu WangZhejiang University, John GrundyMonash University
icse-2020-Journal-First01:13 - 01:21
Talk
Xu WangCollege of Engineering & Computer ScienceAustralian National University, Canberra, Australia, Chunyang ChenMonash University, Zhenchang XingAustralia National University
icse-2020-papers01:21 - 01:33
Talk
Ehsan MirsaeediConcordia University, Peter RigbyConcordia University, Montreal, Canada