The Impact of Feature Reduction Techniques on Defect Prediction Models (ICSE 2020 - Journal First)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Masanari Kondo, Cor-Paul Bezemer, Yasutaka Kamei, Ahmed E. Hassan, Osamu Mizuno

Track

ICSE 2020 Journal First

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 9 Jul 2020 08:17 - 08:25 at Baekje - I16-Testing and Debugging 2 Chair(s): Rui Abreu

Abstract

Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are several reasons to keep the number of features that are used in a defect prediction model small. For example, using a small number of features avoids the problem of multicollinearity and the so-called `curse of dimensionality’. Feature selection and reduction techniques can help to reduce the number of features in a model. Feature selection techniques reduce the number of features in a model by selecting the most important ones, while feature reduction techniques reduce the number of features by creating new, combined features from the original features. Several recent studies have investigated the impact of feature \emph{selection} techniques on defect prediction. However, there do not exist large-scale studies in which the impact of multiple feature \emph{reduction} techniques on defect prediction is investigated.

In this paper [1], we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models. In addition, we compare the impact of the studied feature reduction techniques with the impact of the two best-performing feature selection techniques (according to prior work).

The following findings are the highlights of our study: (1) The studied correlation and consistency-based feature selection techniques result in the best-performing supervised defect prediction models, while feature reduction techniques using neural network-based techniques (restricted Boltzmann machine and autoencoder) result in the best-performing unsupervised defect prediction models. In both cases, the defect prediction models that use the selected/generated features perform better than those that use the original features (in terms of AUC and performance variance). (2) Neural network-based feature reduction techniques generate features that have a small variance across both supervised and unsupervised defect prediction models. Hence, we recommend that practitioners who do not wish to choose a best-performing defect prediction model for their data use a neural network-based feature reduction technique.

Masanari Kondo

Kyoto Institute of Technology

Japan

Cor-Paul Bezemer

University of Alberta, Canada

Canada

Yasutaka Kamei

Kyushu University

Japan

Ahmed E. Hassan

Queen's University

Canada

Osamu Mizuno

Kyoto Institute of Technology

Japan

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 9 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

08:05 - 09:05	I16-Testing and Debugging 2Technical Papers / Journal First at Baekje Chair(s): Rui Abreu Instituto Superior Técnico, U. Lisboa & INESC-ID

08:05 12m Talk		Low-Overhead Deadlock PredictionTechnical Technical Papers Yan Cai Institute of Software, Chinese Academy of Sciences, Ruijie Meng University of Chinese Academy of Sciences, Jens Palsberg University of California, Los Angeles
08:17 8m Talk		The Impact of Feature Reduction Techniques on Defect Prediction ModelsJ1 Journal First Masanari Kondo Kyoto Institute of Technology, Cor-Paul Bezemer University of Alberta, Canada, Yasutaka Kamei Kyushu University, Ahmed E. Hassan Queen's University, Osamu Mizuno Kyoto Institute of Technology
08:25 8m Talk		The Impact of Correlated Metrics on the Interpretation of Defect ModelsJ1 Journal First Jirayus Jiarpakdee Monash University, Australia, Kla Tantithamthavorn Monash University, Australia, Ahmed E. Hassan Queen's University
08:33 8m Talk		The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect PredictionJ1 Journal First Yuanrui Fan Zhejiang University, Xin Xia Monash University, Daniel Alencar Da Costa University of Otago, David Lo Singapore Management University, Ahmed E. Hassan Queen's University, Shanping Li Zhejiang University
08:41 8m Talk		Which Variables Should I Log?J1 Journal First Zhongxin Liu Zhejiang University, Xin Xia Monash University, David Lo Singapore Management University, Zhenchang Xing Australia National University, Ahmed E. Hassan Queen's University, Shanping Li Zhejiang University
08:49 12m Talk		Understanding the Automated Parameter Optimization on Transfer Learning for Cross-Project Defect Prediction: An Empirical StudyTechnical Technical Papers Ke Li University of Exeter, Zilin Xiang University of Electronic Science and Technology of China, Tao Chen Loughborough University, Shuo Wang , Kay Chen Tan City University of Hong Kong Pre-print