Neural Network Based Classification of Self-admitted Technical Debt: From Performance to Explainability and Deployability (ICSE 2020 - Journal First)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Xiaoxue Ren, Zhenchang Xing, Xin Xia, David Lo, Xinyu Wang, John Grundy

Track

ICSE 2020 Journal First

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 8 Jul 2020 01:05 - 01:13 at Goguryeo - P11-Natural Language Artifacts Chair(s): Jane Cleland-Huang

Abstract

Technical debt is a metaphor to reflect the tradeoff software engineers make between short term benefits and long term stability. Self-admitted technical debt (SATD) [2], a variant of technical debt, has been proposed to identify debt that is intentionally introduced during software development, e.g., temporary fixes and workarounds. Previous studies have leveraged human-summarized patterns (which represent n-gram phrases that can be used to identify SATD) or text mining techniques to detect SATD in source code comments. However, several characteristics of SATD features in code comments, such as vocabulary diversity, project uniqueness, length and semantic variations, pose a big challenge to the accuracy of pattern or traditional text-mining based SATD detection, especially for cross-project deployment [1]. Further more, although traditional text-mining based method outperforms pattern-based method in prediction accuracy, the text features it uses are less intuitive than human-summarized patterns, which makes the prediction results hard to explain. In this paper, we first identify five characteristics of SATD comments that affect the performance, generalizability and adaptability of pattern-based SATD detection [2] and traditional text-mining based SATD classification [1]. To improve the accuracy of SATD prediction, especially for cross-project prediction, we propose a Convolutional Neural Network (CNN)-based approach for classifying code comments as SATD or non-SATD. To improve the explainability of our model’s prediction results, we exploit the computational structure of CNNs to identify key phrases and patterns in code comments that are most relevant to SATD.

The main contributions of this paper are: (1) We present a novel CNN-based approach to identify SATDs from source code comments, which is an imbalanced dataset. Our approach achieves a substantial improvement over text mining approaches in both within- and cross-project settings; (2) We have designed a backtracking method to extract and highlight key phrases and SATD patterns in the code comments, which can then be used to explain the SATD classification results by the CNN model; (3) We have conducted extensive experiments to evaluate not only the performance of our approach, but also its generalizability and adaptability, as well as the intuitiveness and explainability of the CNN-learned SATD features and patterns.

Xiaoxue Ren

Zhejiang University

Zhenchang Xing

Australia National University

Australia

Xin Xia

Monash University

Australia

David Lo

Singapore Management University

Singapore

Xinyu Wang

Zhejiang University

China

John Grundy

Monash University

Australia

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 8 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

01:05 - 02:05	P11-Natural Language ArtifactsJournal First / Technical Papers at Goguryeo Chair(s): Jane Cleland-Huang University of Notre Dame

01:05 8m Talk		Neural Network Based Classification of Self-admitted Technical Debt: From Performance to Explainability and DeployabilityJ1 Journal First Xiaoxue Ren Zhejiang University, Zhenchang Xing Australia National University, Xin Xia Monash University, David Lo Singapore Management University, Xinyu Wang Zhejiang University, John Grundy Monash University
01:13 8m Talk		Domain-specific Machine Translation with Recurrent Neural Network for Software LocalizationJ1 Journal First Xu Wang College of Engineering & Computer ScienceAustralian National University, Canberra, Australia, Chunyang Chen Monash University, Zhenchang Xing Australia National University
01:21 12m Talk		Mitigating Turnover with Code Review Recommendation: Balancing Expertise, Workload, and Knowledge DistributionTechnical Technical Papers Ehsan Mirsaeedi Concordia University, Peter Rigby Concordia University, Montreal, Canada