A Procedure and Guidelines for Analyzing Groups of Software Engineering Replications
Context: With the aim of increasing the reliability and generalizability of individual experiment results, software engineering (SE) researchers from different groups and institutions typically collaborate to perform groups of experiments by means of replication (i.e., conduct groups of replications). However, disparate aggregation techniques are being applied to analyze groups of replications: narrative synthesis, aggregated data, mega-trial individual participant data (IPD-MT), stratified individual participant data (IPD-S), and aggregation of p-values. The application of unsuitable techniques to aggregate replication results may undermine the potential of groups of replications to provide in-depth insights from experiment results.
Objectives: Provide an analysis procedure with a set of embedded guidelines to aggregate the results of groups of SE replications.
Method: First, we compare the characteristics of groups of replications for SE and other mature experimental disciplines such as medicine and pharmacology and identify their differences. Then, we identify four major limitations regarding joint data analysis practices in groups of SE replications: - Limitation 1: Narrative synthesis and aggregation of p-values are used in 53% of groups of SE replications. - Limitation 2: IPD-MT is used in 33% of groups of SE replications. - Limitation 3: AD is used in 38% of groups of SE replications. - Limitation 4: SE researchers rarely acknowledge the limitations of the exploratory analyses that they undertake for identifying moderators.
Next, we study the guidelines provided in mature experimental disciplines to analyze groups of replications. With all this, we develop an analysis procedure with a set of embedded guidelines specifically tailored to the analysis of groups of SE replications. We apply the proposed analysis procedure to a representative group of SE replications to illustrate its use.
Results: We propose the adoption of a four-step procedure to analyze groups of SE replications: - Step 1. Describe participants. The objectives of this step are: describe the population to which the results should be generalized, and suggest plausible sources of heterogeneity that may arise when providing joint results. - Step 2. Analyze individual replications. The objectives of this step are: provide descriptive statistics to ease the incorporation of results into prospective studies, identify patterns across replication results, and ensure that statistical heterogeneity is not introduced by the different methods used to analyze the replications. - Step 3. Aggregate the results. The objective of this step is to increase the reliability of joint conclusions. Three guidelines are proposed, specifically tailored to address limitations 1-3:
* Guideline 1: Avoid narrative synthesis and aggregation of p-values to provide joint conclusions. * Guideline 2: Avoid IPD-MT due to its potential to provide biased or underpowered results. * Guideline 3: Use AD and IPD-S. - Step 4. Conduct exploratory analyses. The objective of this step is to identify experiment-level and participant-level moderators that may be behind the statistical heterogeneity commonly present in groups of SE replications. Three additional guidelines are proposed to address limitation 4: * Guideline 4: Use AD and IPD-S to identify experiment-level moderators. * Guideline 5: Use IPD-S to identify participant-level moderators. * Guideline 6: Acknowledge limitations of exploratory analyses.
Conclusion: The aggregation techniques used to analyze groups of replications should be justified in research articles. The statistical models and the raw data that used should also be transparently reported. This will increase the reliability and transparency of joint results. The proposed guidelines should ease this endeavor.
Tue 7 JulDisplayed time zone: (UTC) Coordinated Universal Time change
07:00 - 08:00
|An Evidence-Based Inquiry into the Use of Grey Literature in Software EngineeringTechnical|
He Zhang Nanjing University, Xin Zhou State Key Laboratory of Novel Software Technology, Software Institute, Nanjing University, Xin Huang State Key Laboratory of Novel Software Technology, Software Institute, Nanjing University, Huang Huang State Key Laboratory of Novel Software Technology, Software Institute, Nanjing University, Muhammad Ali Babar The University of Adelaide
|An Extended Abstract of "Metamorphic Robustness Testing: Exposing Hidden Defects in Citation Statistics and Journal Impact Factors"J1|
|A Procedure and Guidelines for Analyzing Groups of Software Engineering ReplicationsJ1|
|An SLR-Tool: Search Process in PracticeDemo|
|Threats to Validity in Experimenting Mutation-Based Fault LocalizationNIER|
New Ideas and Emerging Results
|Methodological Principles for Reproducible Performance Evaluation in Cloud ComputingJ1|
Alessandro Vittorio Papadopoulos Mälardalen University, Laurens Versluis Vrije Universiteit Amsterdam, André Bauer University of Würzburg, Nikolas Herbst University of Würzburg, Joakim von Kistowski University of Würzburg, Ahmed Ali-Eldin UMass Amherst, Cristina L. Abad Escuela Superior Politecnica del Litoral, Jose Nelson Amaral University of Alberta, Petr Tuma Charles University, Alexandru Iosup Vrije Universiteit Amsterdam
|Bayesian Data Analysis in Empirical Software Engineering ResearchJ1|
Carlo A. Furia Università della Svizzera italiana (USI), Robert Feldt Chalmers | University of Gothenburg, Blekinge Institute of Technology, Richard Torkar Chalmers and the University of GothenburgDOI Pre-print