Understanding the Automated Parameter Optimization on Transfer Learning for Cross-Project Defect Prediction: An Empirical StudyTechnical
Data-driven defect prediction has become increasingly important in software engineering process. Since it is not uncommon that data from a software project is insufficient for training a reliable defect prediction model, transfer learning that borrows data/konwledge from other projects to facilitate the model building at the current project, namely Cross-Project Defect Prediction (CPDP), is naturally plausible. Most CPDP techniques involve two major steps, i.e., transfer learning and classification, each of which has at least one parameter to be tuned to achieve their optimal performance. This practice fits well with the purpose of automated parameters optimization. However, there is a lack of thorough understanding about what are the impacts of automated parameters optimization on various CPDP techniques. In this paper, we present the first empirical study that looks into such impacts on 62 CPDP techniques, 13 of which are chosen from the existing CPDP literature while the other 49 ones have not been explored before. We build defect prediction models over 20 real-world software projects that are of different scales and characteristics. Our findings demonstrate that: (1) Automated parameter optimization substantially improves the defect prediction performance of 77% CPDP techniques with a manageable computational cost. Thus more efforts on this aspect are required in future CPDP studies. (2) Transfer learning is of ultimate importance in CPDP. Given a tight computational budget, it is more cost-effective to focus on optimizing the parameter configuration of transfer learning algorithms (3) The research on CPDP is far from mature where it is ‘not difficult’ to find a better alternative by making a combination of existing transfer learning and classification techniques. This finding provides important insights about the future design of CPDP techniques.
Thu 9 JulDisplayed time zone: (UTC) Coordinated Universal Time change
08:05 - 09:05 | I16-Testing and Debugging 2Technical Papers / Journal First at Baekje Chair(s): Rui Abreu Instituto Superior Técnico, U. Lisboa & INESC-ID | ||
08:05 12mTalk | Low-Overhead Deadlock PredictionTechnical Technical Papers Yan Cai Institute of Software, Chinese Academy of Sciences, Ruijie Meng University of Chinese Academy of Sciences, Jens Palsberg University of California, Los Angeles | ||
08:17 8mTalk | The Impact of Feature Reduction Techniques on Defect Prediction ModelsJ1 Journal First Masanari Kondo Kyoto Institute of Technology, Cor-Paul Bezemer University of Alberta, Canada, Yasutaka Kamei Kyushu University, Ahmed E. Hassan Queen's University, Osamu Mizuno Kyoto Institute of Technology | ||
08:25 8mTalk | The Impact of Correlated Metrics on the Interpretation of Defect ModelsJ1 Journal First Jirayus Jiarpakdee Monash University, Australia, Kla Tantithamthavorn Monash University, Australia, Ahmed E. Hassan Queen's University | ||
08:33 8mTalk | The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect PredictionJ1 Journal First Yuanrui Fan Zhejiang University, Xin Xia Monash University, Daniel Alencar Da Costa University of Otago, David Lo Singapore Management University, Ahmed E. Hassan Queen's University, Shanping Li Zhejiang University | ||
08:41 8mTalk | Which Variables Should I Log?J1 Journal First Zhongxin Liu Zhejiang University, Xin Xia Monash University, David Lo Singapore Management University, Zhenchang Xing Australia National University, Ahmed E. Hassan Queen's University, Shanping Li Zhejiang University | ||
08:49 12mTalk | Understanding the Automated Parameter Optimization on Transfer Learning for Cross-Project Defect Prediction: An Empirical StudyTechnical Technical Papers Ke Li University of Exeter, Zilin Xiang University of Electronic Science and Technology of China, Tao Chen Loughborough University, Shuo Wang , Kay Chen Tan City University of Hong Kong Pre-print |