Developers typically rely on regression testing to know if their recent changes caused any regressions. However, flaky tests, which are tests that may pass or fail even without code changes, can provide misleading signals to developers. In recent years there has been considerable interest from industry and academia on flaky tests. However, there still aren’t many studies on the reproducibility, runtime, causes, and fixes of flaky tests, particularly in industry.
To fill such gap, we study the lifecycle of flaky tests on six large-scale industrial projects within AnonCompany. More specifically, we study the prevalence, reproducibility, characteristics, categories, and resolution of flaky tests. Our study on the prevalence of flaky tests informs us of the impact flaky tests have on the developers at AnonCompany. To understand a major challenge developers have for debugging and fixing flaky tests, we then proceed to study the reproducibility of flaky-test failures. Realizing the challenge developers face when debugging and fixing flaky tests, we proceed to characterize and categorize the flaky tests within AnonCompany. Lastly, we study how long developers take and how effective they are at fixing flaky tests to understand their resolution process of flaky tests. We believe that our study can help reaffirm findings from an industrial setting that previous studies on open-source projects have found and provide many new insights regarding the lifecycle of flaky tests that can help guide future research on the important topic of flaky tests.