Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Wed 8 Jul 2020 02:10 - 03:00 at Poster Special Room - P306-Posters

Data scientists frequently analyze data by writing scripts. We conducted a contextual inquiry with interdisciplinary researchers, which revealed that parameter tuning is a highly iterative process and that debugging is time-consuming. As analysis scripts evolve and become more complex, analysts have difficulty conceptualizing their workflow. In particular, after editing a script, it becomes difficult to determine precisely which code blocks depend on the edit. Consequently, scientists frequently re-run entire scripts instead of re-running only the necessary parts. We present ProvBuild, a tool that leverages language-level provenance to streamline the debugging process by reducing programmer cognitive load and decreasing subsequent runtimes, leading to an overall reduction in elapsed debugging time. ProvBuild uses provenance to track dependencies in a script. When an analyst debugs a script, ProvBuild generates a simplified script that contains only the information necessary to debug a particular problem. We demonstrate that debugging the simplified script lowers a programmer’s cognitive load and permits faster re-execution when testing changes. The combination of reduced cognitive load and shorter runtime reduces the time necessary to debug a script. We quantitatively and qualitatively show that even though ProvBuild introduces overhead during a script’s first execution, it is a more efficient way for users to debug and tune complex workflows. ProvBuild demonstrates a novel use of language-level provenance, in which it is used to proactively improve programmer productively rather than merely providing a way to retroactively gain insight into a body of code. ProvBuild is a data analysis environment that uses change impact analysis to improve the iterative debugging process in script-based workflow pipelines. It is the first debugging tool to leverage language-level provenance to reduce cognitive load and execution time.

Wed 8 Jul

Displayed time zone: (UTC) Coordinated Universal Time change

02:10 - 03:00
02:10
50m
Poster
A Practical, Collaborative Approach for Modeling Big Data Analytics Application Requirements
ICSE 2020 Posters
Hourieh Khalajzadeh Monash University, Australia, Andrew Simmons Deakin University, Mohamed Abdelrazek Deakin University, John Grundy Monash University, John Hosking University of Auckland, Qiang He , Prasanna Ratnakanthan , Adil Zia , Meng Law
02:10
50m
Poster
ProvBuild: Improving Data Scientist Efficiency with Provenance (An Extended Abstract)
ICSE 2020 Posters
Jingmei Hu Harvard University, Jiwon Joung Harvard University, Maia Jacobs Harvard University, Krzysztof Gajos Harvard University, Margo Seltzer University of British Columbia
02:10
50m
Poster
Elite Developers' Activities at Open Source Ecosystem Level
ICSE 2020 Posters
James Jones University of California, Irvine, David Redmiles University of California, Irvine
02:10
50m
Poster
Semantic Analysis of Issues on Google Play and Twitter
ICSE 2020 Posters
Aman Yadav , Fatemeh Hendijani Fard University of British Columbia
02:10
50m
Poster
An Intelligent Tool for Combatting Contract Cheating Behaviour by Facilitating Scalable Student-Tutor Discussions
ICSE 2020 Posters
Jake Renzella Deakin University, Andrew Cain Deakin University, Jean-Guy Schneider Deakin University
02:10
50m
Poster
Poster: How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub
ICSE 2020 Posters
Shurui Zhou Carnegie Mellon University, USA / University of Toronto, CA, Bogdan Vasilescu Carnegie Mellon University, Christian Kästner Carnegie Mellon University
02:10
50m
Poster
An Oracle Language for Autonomous Vehicles
ICSE 2020 Posters
Ana Nora Evans University of Virginia, USA, Mary Lou Soffa University of Virginia, Sebastian Elbaum University of Virginia, USA