Modeling and Ranking Flaky Tests at AppleSEIP
Test flakiness—inability to reliably repeat a test’s Pass/Fail outcome—continues to be a significant problem in Industry, adversely impacting continuous integration and test pipelines. Completely eliminating flaky tests is not a realistic option as a significant fraction of system tests (typically non-hermetic) for services-based implementations exhibit some level of flakiness. In this paper, we view the flakiness of a test as a rankable value, which we quantify, track and assign a confidence. We develop two ways to model flakiness, capturing the randomness of test results via entropy, and the temporal variation via flipRate, and aggregating these over time. We have implemented our flakiness scoring service and discuss how its adoption has impacted test suites of two large services at Apple. We show how flakiness is distributed across the tests in these services, including typical score ranges and outliers. The flakiness scores are used to monitor and detect changes in flakiness trends. Evaluation results demonstrate near perfect accuracy in ranking, identification and alignment with human interpretation. The scores were used to identify 2 causes of flakiness in the dataset evaluated, which have been confirmed, and where fixes have been implemented or are underway. Our models reduced flakiness by 44% with less than 1% loss in fault detection.
Thu 9 JulDisplayed time zone: (UTC) Coordinated Universal Time change
00:00 - 01:00 | P14-TestingTechnical Papers / Software Engineering in Practice at Goguryeo Chair(s): Shin Yoo Korea Advanced Institute of Science and Technology | ||
00:00 12mTalk | Seenomaly: Vision-Based Linting of GUI Animation Effects Against Design-Don’t GuidelinesTechnical Technical Papers Dehai Zhao Australian National University, Zhenchang Xing Australia National University, Chunyang Chen Monash University, Xiwei (Sherry) Xu Data 61, Liming Zhu CSIRO's Data61 and UNSW, Guoqiang Li Shanghai Jiao Tong University, Jinshui Wang School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China | ||
00:12 12mTalk | Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural NetworksTechnical Technical Papers Xiang Gao National University of Singapore, Singapore, Ripon Saha Fujitsu Laboratories of America, Inc., Mukul Prasad Fujitsu Laboratories of America, Abhik Roychoudhury National University of Singapore, Singapore | ||
00:24 12mTalk | Modeling and Ranking Flaky Tests at AppleSEIP Software Engineering in Practice Emily Kowalczyk Apple Inc., Karan Nair Apple, Zebao Gao Apple, Leopold Silberstein Apple Inc., Teng Long Apple, Atif Memon Apple Inc. | ||
00:36 12mTalk | Testing File System Implementations on Layered ModelsTechnical Technical Papers Dongjie Chen Nanjing University, Yanyan Jiang Nanjing University, Chang Xu Nanjing University, Xiaoxing Ma Nanjing University, Jian Lu Nanjing University | ||
00:48 12mTalk | A Cost-efficient Approach to Building in Continuous IntegrationTechnical Technical Papers Pre-print |