Impact of Discretization Noise of the Dependent variable on Machine Learning Classifiers in Software EngineeringJ1
Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due to ambiguous class loyalty of data points that are close to the artificial threshold. Previous studies do not provide a clear directive on the impact of discretization noise on the classifiers and how to handle such noise. In this paper, we propose a framework to help researchers and practitioners systematically estimate the impact of discretization noise on classifiers in terms of its impact on various performance measures and the interpretation of classifiers. Through a case study of seven software engineering datasets, we find that: 1) discretization noise affects the different performance measures of a classifier differently for different datasets; 2) Though the interpretation of the classifiers are impacted by the discretization noise on the whole, the top 3 most important features are not affected by the discretization noise. Therefore, we suggest that practitioners and researchers use our framework to understand the impact of discretization noise on the performance of their built classifiers and estimate the exact amount of discretization noise to be discarded from the dataset to avoid the negative impact of such noise.
Wed 8 Jul Times are displayed in time zone: (UTC) Coordinated Universal Time change
15:00 - 16:00: A8-Machine Learning and ModelsPaper Presentations / Journal First / Technical Papers at Goguryeo Chair(s): Liliana PasqualeUniversity College Dublin & Lero | |||
15:00 - 15:08 Talk | Improving Vulnerability Inspection Efficiency Using Active LearningJ1 Journal First Zhe YuNORTH CAROLINA STATE UNIVERSITY, Chris TheisenMicrosoft, Laurie WilliamsNorth Carolina State University, Tim MenziesNorth Carolina State University | ||
15:08 - 15:16 Talk | How Bugs Are Born: A Model to Identify How Bugs Are Introduced in Software ComponentsJ1 Journal First Gema Rodríguez-PérezUniversity of Waterloo, Canada, Gregorio RoblesUniversidad Rey Juan Carlos, Alexander SerebrenikEindhoven University of Technology, Andy ZaidmanTU Delft, Daniel M. GermanUniversity of Victoria, Jesus M. Gonzalez-BarahonaUniversidad Rey Juan Carlos DOI Pre-print | ||
15:16 - 15:24 Talk | How to “DODGE” Complex Software AnalyticsJ1 Journal First Amritanshu AgrawalWayfair, Wei FuLanding AI, Di ChenNorth Carolina State University, USA, Xipeng ShenNorth Carolina State University, Tim MenziesNorth Carolina State University | ||
15:24 - 15:36 Talk | Importance-Driven Deep Learning System TestingTechnical Technical Papers Simos GerasimouUniversity of York, UK, Hasan Ferit EniserMPI-SWS, Alper SenBogazici University, Turkey, Alper ÇakanBogazici University, Turkey | ||
15:36 - 15:48 Talk | Quickly Generating Diverse Valid Test Inputs with Reinforcement Learning Technical Papers Sameer ReddyUniversity of California, Berkeley, Caroline LemieuxUniversity of California, Berkeley, Rohan PadhyeCarnegie Mellon University, Koushik SenUniversity of California, Berkeley | ||
15:48 - 15:56 Talk | Impact of Discretization Noise of the Dependent variable on Machine Learning Classifiers in Software EngineeringJ1 Journal First Gopi Krishnan RajbahadurQueen's University, Shaowei WangMississippi State University, Yasutaka KameiKyushu University, Ahmed E. HassanQueen's University |