An Empirical Study on Program Failures of Deep Learning Jobs (ICSE 2020 - Technical Papers)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang

Track

ICSE 2020 Technical Papers

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 7 Jul 2020 08:47 - 08:59 at Goguryeo - I5-Deep Learning Testing and Debugging Chair(s): Pooyan Jamshidi

Abstract

Deep learning has made significant achievements in many application areas. To train and test models more efficiently, enterprise developers submit and run their deep learning programs on a shared, multi-tenant platform. However, some of the programs fail after a long execution time due to code/script defects, which reduces the development productivity and wastes expensive resources such as GPU, storage, and network I/O.

This paper presents the first comprehensive empirical study on program failures of deep learning jobs. 4960 real failures are collected from a deep learning platform in Microsoft. We manually examine their failure messages and classify them into 20 categories. In addition, we identify the common root causes and bug-fix solutions on a sample of 400 failures. To better understand the current testing and debugging practices for deep learning, we also conduct developer interviews. Our major findings include: (1) 48.0% of the failures occur in the interaction with the platform rather than in the execution of code logic, mostly due to the discrepancies between local and platform execution environments; (2) Deep learning specific failures (13.5%) are mainly caused by inappropriate model parameters/structures and framework API misunderstanding; (3) Current debugging practices are not efficient for fault localization in many cases, and developers need more deep learning specific tools. Based on our findings, we further suggest possible research topics and tooling support that could facilitate future deep learning development.

Link to Preprint

https://wencongxiao.github.io/res/icse20/icse20-main-199.pdf

DOI

https://doi.org/10.1145/3377811.3380362

Ru Zhang

Microsoft Research

Wencong Xiao

Alibaba

Hongyu Zhang

University of Newcastle, Australia

Australia

Yu Liu

Microsoft Research

Haoxiang Lin

Microsoft Research

Mao Yang

Microsoft Research

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 7 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

08:05 - 09:05	I5-Deep Learning Testing and DebuggingTechnical Papers / Demonstrations at Goguryeo Chair(s): Pooyan Jamshidi University of South Carolina

08:05 12m Talk		DISSECTOR: Input Validation for Deep Learning Applications by Crossing-layer DissectionTechnical Technical Papers Huiyan Wang State Key Lab. for Novel Software Tech. and Dept. of Comp. Sci. and Tech., Nanjing University, Nanjing, China, Jingwei Xu Nanjing University, Chang Xu Nanjing University, Xiaoxing Ma Nanjing University, Jian Lv Nanjing University
08:17 12m Talk		White-box Fairness Testing through Adversarial SamplingTechnical Technical Papers Peixin Zhang Zhejiang University, Jingyi Wang National University of Singapore, Singapore, Jun Sun Singapore Management University, Guoliang Dong Computer College of Zhejiang University, Xinyu Wang Zhejiang University, Xingen Wang Zhejiang University, Jin Song Dong National University of Singapore, Dai Ting Huawei Corporation
08:29 3m Talk		FeatureNET: Diversity-driven Generation of Deep Learning ModelsDemo Demonstrations Salah Ghamizi SntT - University of Luxembourg, Maxime Cordy SnT, University of Luxembourg, Mike Papadakis University of Luxembourg, Yves Le Traon University of Luxembourg
08:32 3m Talk		EvalDNN: A Toolbox for Evaluating Deep Neural Network ModelsDemo Demonstrations Yongqiang Tian The Hong Kong University of Science and Technology, Zhihua Zeng Zhejiang University, Ming Wen Huazhong University of Science and Technology, China, Yepang Liu Southern University of Science and Technology, Tzu-yang Kuo The Hong Kong University of Science and Technology, Shing-Chi Cheung Department of Computer Science and Engineering, The Hong Kong University of Science and Technology
08:35 12m Talk		Taxonomy of Real Faults in Deep Learning SystemsTechnical Technical Papers Nargiz Humbatova Università della Svizzera italiana, Gunel Jahangirova Università della Svizzera italiana, Gabriele Bavota Università della Svizzera italiana, Vincenzo Riccio Università della Svizzera italiana, Andrea Stocco Università della Svizzera italiana, Paolo Tonella Università della Svizzera italiana
08:47 12m Talk		An Empirical Study on Program Failures of Deep Learning JobsTechnical Technical Papers Ru Zhang Microsoft Research, Wencong Xiao Alibaba, Hongyu Zhang University of Newcastle, Australia, Yu Liu Microsoft Research, Haoxiang Lin Microsoft Research, Mao Yang Microsoft Research DOI Pre-print