Modeling and Ranking Flaky Tests at Apple (ICSE 2020 - Software Engineering in Practice)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Emily Kowalczyk, Karan Nair, Zebao Gao, Leopold Silberstein, Teng Long, Atif Memon

Track

ICSE 2020 Software Engineering in Practice

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 9 Jul 2020 00:24 - 00:36 at Goguryeo - P14-Testing Chair(s): Shin Yoo

Abstract

Test flakiness—inability to reliably repeat a test’s Pass/Fail outcome—continues to be a significant problem in Industry, adversely impacting continuous integration and test pipelines. Completely eliminating flaky tests is not a realistic option as a significant fraction of system tests (typically non-hermetic) for services-based implementations exhibit some level of flakiness. In this paper, we view the flakiness of a test as a rankable value, which we quantify, track and assign a confidence. We develop two ways to model flakiness, capturing the randomness of test results via entropy, and the temporal variation via flipRate, and aggregating these over time. We have implemented our flakiness scoring service and discuss how its adoption has impacted test suites of two large services at Apple. We show how flakiness is distributed across the tests in these services, including typical score ranges and outliers. The flakiness scores are used to monitor and detect changes in flakiness trends. Evaluation results demonstrate near perfect accuracy in ranking, identification and alignment with human interpretation. The scores were used to identify 2 causes of flakiness in the dataset evaluated, which have been confirmed, and where fixes have been implemented or are underway. Our models reduced flakiness by 44% with less than 1% loss in fault detection.

Emily Kowalczyk

Apple Inc.

United States

Karan Nair

Apple

Zebao Gao

Apple

Leopold Silberstein

Apple Inc.

Teng Long

Apple

Atif Memon

Apple Inc.

United States

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 9 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

00:00 - 01:00	P14-TestingTechnical Papers / Software Engineering in Practice at Goguryeo Chair(s): Shin Yoo Korea Advanced Institute of Science and Technology

00:00 12m Talk		Seenomaly: Vision-Based Linting of GUI Animation Effects Against Design-Don’t GuidelinesTechnical Technical Papers Dehai Zhao Australian National University, Zhenchang Xing Australia National University, Chunyang Chen Monash University, Xiwei (Sherry) Xu Data 61, Liming Zhu CSIRO's Data61 and UNSW, Guoqiang Li Shanghai Jiao Tong University, Jinshui Wang School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China
00:12 12m Talk		Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural NetworksTechnical Technical Papers Xiang Gao National University of Singapore, Singapore, Ripon Saha Fujitsu Laboratories of America, Inc., Mukul Prasad Fujitsu Laboratories of America, Abhik Roychoudhury National University of Singapore, Singapore
00:24 12m Talk		Modeling and Ranking Flaky Tests at AppleSEIP Software Engineering in Practice Emily Kowalczyk Apple Inc., Karan Nair Apple, Zebao Gao Apple, Leopold Silberstein Apple Inc., Teng Long Apple, Atif Memon Apple Inc.
00:36 12m Talk		Testing File System Implementations on Layered ModelsTechnical Technical Papers Dongjie Chen Nanjing University, Yanyan Jiang Nanjing University, Chang Xu Nanjing University, Xiaoxing Ma Nanjing University, Jian Lv Nanjing University
00:48 12m Talk		A Cost-efficient Approach to Building in Continuous IntegrationTechnical Technical Papers Xianhao Jin Virginia Tech, USA, Francisco Servant Virginia Tech Pre-print