Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Wed 8 Jul 2020 15:48 - 15:56 at Goguryeo - A8-Machine Learning and Models Chair(s): Liliana Pasquale

Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due to ambiguous class loyalty of data points that are close to the artificial threshold. Previous studies do not provide a clear directive on the impact of discretization noise on the classifiers and how to handle such noise. In this paper, we propose a framework to help researchers and practitioners systematically estimate the impact of discretization noise on classifiers in terms of its impact on various performance measures and the interpretation of classifiers. Through a case study of seven software engineering datasets, we find that: 1) discretization noise affects the different performance measures of a classifier differently for different datasets; 2) Though the interpretation of the classifiers are impacted by the discretization noise on the whole, the top 3 most important features are not affected by the discretization noise. Therefore, we suggest that practitioners and researchers use our framework to understand the impact of discretization noise on the performance of their built classifiers and estimate the exact amount of discretization noise to be discarded from the dataset to avoid the negative impact of such noise.

Wed 8 Jul

Displayed time zone: (UTC) Coordinated Universal Time change

15:00 - 16:00
A8-Machine Learning and ModelsJournal First / Technical Papers at Goguryeo
Chair(s): Liliana Pasquale University College Dublin & Lero
15:00
8m
Talk
Improving Vulnerability Inspection Efficiency Using Active LearningJ1
Journal First
Zhe Yu NORTH CAROLINA STATE UNIVERSITY, Chris Theisen Microsoft, Laurie Williams North Carolina State University, Tim Menzies North Carolina State University
15:08
8m
Talk
How Bugs Are Born: A Model to Identify How Bugs Are Introduced in Software ComponentsJ1
Journal First
Gema Rodríguez-Pérez University of Waterloo, Canada, Gregorio Robles Universidad Rey Juan Carlos, Alexander Serebrenik Eindhoven University of Technology, Andy Zaidman TU Delft, Daniel M. German University of Victoria, Jesus M. Gonzalez-Barahona Universidad Rey Juan Carlos
DOI Pre-print
15:16
8m
Talk
How to “DODGE” Complex Software AnalyticsJ1
Journal First
Amritanshu Agrawal Wayfair, Wei Fu Landing AI, Di Chen North Carolina State University, USA, Xipeng Shen North Carolina State University, Tim Menzies North Carolina State University
15:24
12m
Talk
Importance-Driven Deep Learning System TestingTechnical
Technical Papers
Simos Gerasimou University of York, UK, Hasan Ferit Eniser MPI-SWS, Alper Sen Bogazici University, Turkey, Alper Çakan Bogazici University, Turkey
15:36
12m
Talk
Quickly Generating Diverse Valid Test Inputs with Reinforcement LearningArtifact ReusableTechnicalArtifact Available
Technical Papers
Sameer Reddy University of California, Berkeley, Caroline Lemieux University of California, Berkeley, Rohan Padhye Carnegie Mellon University, Koushik Sen University of California, Berkeley
15:48
8m
Talk
Impact of Discretization Noise of the Dependent variable on Machine Learning Classifiers in Software EngineeringJ1
Journal First
Gopi Krishnan Rajbahadur Queen's University, Shaowei Wang Mississippi State University, Yasutaka Kamei Kyushu University, Ahmed E. Hassan Queen's University