Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Sat 11 Jul 2020 00:36 - 00:48 at Silla - P27-Applications Chair(s): Ganesha Upadhyaya

Data scientists frequently analyze data by writing scripts. We conducted a contextual inquiry of interdisciplinary researchers, which revealed that parameter tuning is a highly iterative process and that debugging is time-consuming. As analysis scripts evolve and become more complex, analysts have difficulty conceptualizing their workflow. In particular, after editing a script, it becomes difficult to determine precisely which code blocks depend on the edit. Consequently, scientists frequently re-run entire scripts instead of re-running only the necessary parts. We present ProvBuild, a tool that leverages language-level provenance to streamline the debugging process by reducing programmer cognitive load and decreasing subsequent runtimes, leading to an overall reduction in elapsed debugging time. ProvBuild uses provenance to track dependencies in a script. When an analyst debugs a script, ProvBuild generates a simplified script that contains only the information necessary to debug a particular problem. We demonstrate that debugging the simplified script lowers a programmer’s cognitive load and permits faster re-execution when testing changes. The combination of reduced cognitive load and shorter runtime reduces the time necessary to debug a script. We quantitatively and qualitatively show that even though ProvBuild introduces overhead during a script’s first execution, it is a more efficient way for users to debug and tune complex workflows. ProvBuild demonstrates a novel use of language-level provenance, in which it is used to proactively improve programmer productively rather than merely providing a way to retroactively gain insight into a body of code.

Sat 11 Jul

Displayed time zone: (UTC) Coordinated Universal Time change

00:00 - 01:00
00:00
12m
Talk
Big Code != Big Vocabulary: Open-Vocabulary Models for Source codeACM SIGSOFT Distinguished Paper AwardsArtifact ReusableTechnicalArtifact Available
Technical Papers
Rafael-Michael Karampatsis The University of Edinburgh, Hlib Babii Free University of Bozen-Bolzano, Romain Robbes Free University of Bozen-Bolzano, Charles Sutton Google Research, Andrea Janes Free University of Bozen-Bolzano
DOI Pre-print
00:12
12m
Talk
Engineering for a Science-Centric Experimentation PlatformSEIP
Software Engineering in Practice
Nikos Diamantopoulos Netflix, Inc., Jeffrey Wong Netflix, Inc., David Issa Mattos Chalmers University of Technology, Ilias Gerostathopoulos Vrije Universiteit Amsterdam, Matthew Wardrop Netflix, Inc., Tobias Mao Netflix, Inc., Colin McFarland Netflix, Inc.
00:24
12m
Talk
Managing data constraints in database-backed web applicationsArtifact ReusableTechnicalArtifact Available
Technical Papers
Junwen Yang University of Chicago, Utsav Sethi University of Chicago, Cong Yan University of Washington, Alvin Cheung University of California, Berkeley, Shan Lu University of Chicago
00:36
12m
Talk
Improving Data Scientist Efficiency with ProvenanceArtifact ReusableTechnicalArtifact Available
Technical Papers
Jingmei Hu Harvard University, Jiwon Joung Harvard University, Maia Jacobs Harvard University, Margo Seltzer University of British Columbia, Krzysztof Gajos Harvard University