Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Thu 9 Jul 2020 08:17 - 08:25 at Baekje - I16-Testing and Debugging 2 Chair(s): Rui Abreu

Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are several reasons to keep the number of features that are used in a defect prediction model small. For example, using a small number of features avoids the problem of multicollinearity and the so-called `curse of dimensionality’. Feature selection and reduction techniques can help to reduce the number of features in a model. Feature selection techniques reduce the number of features in a model by selecting the most important ones, while feature reduction techniques reduce the number of features by creating new, combined features from the original features. Several recent studies have investigated the impact of feature \emph{selection} techniques on defect prediction. However, there do not exist large-scale studies in which the impact of multiple feature \emph{reduction} techniques on defect prediction is investigated.

In this paper [1], we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models. In addition, we compare the impact of the studied feature reduction techniques with the impact of the two best-performing feature selection techniques (according to prior work).

The following findings are the highlights of our study: (1) The studied correlation and consistency-based feature selection techniques result in the best-performing supervised defect prediction models, while feature reduction techniques using neural network-based techniques (restricted Boltzmann machine and autoencoder) result in the best-performing unsupervised defect prediction models. In both cases, the defect prediction models that use the selected/generated features perform better than those that use the original features (in terms of AUC and performance variance). (2) Neural network-based feature reduction techniques generate features that have a small variance across both supervised and unsupervised defect prediction models. Hence, we recommend that practitioners who do not wish to choose a best-performing defect prediction model for their data use a neural network-based feature reduction technique.

Thu 9 Jul
Times are displayed in time zone: (UTC) Coordinated Universal Time change

08:05 - 09:05: Paper Presentations - I16-Testing and Debugging 2 at Baekje
Chair(s): Rui AbreuInstituto Superior Técnico, U. Lisboa & INESC-ID
icse-2020-papers08:05 - 08:17
Yan CaiInstitute of Software, Chinese Academy of Sciences, Ruijie MengUniversity of Chinese Academy of Sciences, Jens PalsbergUniversity of California, Los Angeles
icse-2020-Journal-First08:17 - 08:25
Masanari KondoKyoto Institute of Technology, Cor-Paul BezemerUniversity of Alberta, Canada, Yasutaka KameiKyushu University, Ahmed E. HassanQueen's University, Osamu MizunoKyoto Institute of Technology
icse-2020-Journal-First08:25 - 08:33
Jirayus JiarpakdeeMonash University, Australia, Chakkrit (Kla) TantithamthavornMonash University, Australia, Ahmed E. HassanQueen's University
icse-2020-Journal-First08:33 - 08:41
Yuanrui FanZhejiang University, Xin XiaMonash University, Daniel Alencar Da CostaUniversity of Otago, David LoSingapore Management University, Ahmed E. HassanQueen's University, Shanping LiZhejiang University
icse-2020-Journal-First08:41 - 08:49
Zhongxin LiuZhejiang University, Xin XiaMonash University, David LoSingapore Management University, Zhenchang XingAustralia National University, Ahmed E. HassanQueen's University, Shanping LiZhejiang University
icse-2020-papers08:49 - 09:01
Ke LiUniversity of Exeter, Zilin XiangUniversity of Electronic Science and Technology of China, Tao ChenLoughborough University, Shuo Wang, Kay Chen TanCity University of Hong Kong