Predicting Software Defect Type using Concept-based Classification (ICSE 2020 - Journal First)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Sangameshwar Patil, Balaraman Ravindran

Track

ICSE 2020 Journal First

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 9 Jul 2020 07:36 - 07:44 at Baekje - I13-Testing and Debugging 1 Chair(s): Shin Hwei Tan

Abstract

This is an extended abstract and presentation proposal for the manuscript ID EMSE-D-18-00360R1 accepted by the Empirical Software Engineering journal. The journal paper is not yet online. The accepted manuscript is uploaded along with this proposal.

Automatically predicting the defect type of a software defect from its description can significantly speed up and improve the software defect management process. The standard supervised learning based approach for this task (Thung et al., WCRE2012) needs 90% of labeled data for training the classifier. Creating such data is an expensive and effort-intensive task requiring domain-specific expertise.

In this paper, we propose to circumvent this problem by carrying out concept-based classification (CBC) of software defect reports with help of the Explicit Semantic Analysis (ESA) framework. We first create the concept-based representations of a software defect report and the defect types in the software defect classification scheme by projecting their textual descriptions into a concept-space spanned by the Wikipedia articles. Then, we compute the semantic similarity between these concept-based representations and assign the software defect type that has the highest similarity with the defect report. The proposed CBC approach achieves accuracy (F1 score = 63.16%) similar to the state-of-the-art semi-supervised and active learning approach (Thung et al., , ICPC 2015) for this task without requiring labeled training data. The state-of-the-art approach requires labels for 15% of input defects and achieves accuracy (F1 score) of 62.3%.

Unlike the state-of-the-art approach, our method does not need access to the source-code used to fix the defect. We use just the textual description of the defect reports and the keywords describing the defect types in the defect classification scheme. Note that learning a classifier without labeled training data is known as zero-shot learning and it is a significantly harder task than learning a classifier using labeled data. The proposed concept-based classification of software defect types is the first instance of zero-shot learning philosophy in the software defect analytics domain.

Sangameshwar Patil

Dept. of CSE, IIT Madras and TRDDC, TCS

Balaraman Ravindran

IIT Madras