A Procedure and Guidelines for Analyzing Groups of Software Engineering Replications
Context: With the aim of increasing the reliability and generalizability of individual experiment results, software engineering (SE) researchers from different groups and institutions typically collaborate to perform groups of experiments by means of replication (i.e., conduct groups of replications). However, disparate aggregation techniques are being applied to analyze groups of replications: narrative synthesis, aggregated data, mega-trial individual participant data (IPD-MT), stratified individual participant data (IPD-S), and aggregation of p-values. The application of unsuitable techniques to aggregate replication results may undermine the potential of groups of replications to provide in-depth insights from experiment results.
Objectives: Provide an analysis procedure with a set of embedded guidelines to aggregate the results of groups of SE replications.
Method: First, we compare the characteristics of groups of replications for SE and other mature experimental disciplines such as medicine and pharmacology and identify their differences. Then, we identify four major limitations regarding joint data analysis practices in groups of SE replications: - Limitation 1: Narrative synthesis and aggregation of p-values are used in 53% of groups of SE replications. - Limitation 2: IPD-MT is used in 33% of groups of SE replications. - Limitation 3: AD is used in 38% of groups of SE replications. - Limitation 4: SE researchers rarely acknowledge the limitations of the exploratory analyses that they undertake for identifying moderators.
Next, we study the guidelines provided in mature experimental disciplines to analyze groups of replications. With all this, we develop an analysis procedure with a set of embedded guidelines specifically tailored to the analysis of groups of SE replications. We apply the proposed analysis procedure to a representative group of SE replications to illustrate its use.
Results: We propose the adoption of a four-step procedure to analyze groups of SE replications: - Step 1. Describe participants. The objectives of this step are: describe the population to which the results should be generalized, and suggest plausible sources of heterogeneity that may arise when providing joint results. - Step 2. Analyze individual replications. The objectives of this step are: provide descriptive statistics to ease the incorporation of results into prospective studies, identify patterns across replication results, and ensure that statistical heterogeneity is not introduced by the different methods used to analyze the replications. - Step 3. Aggregate the results. The objective of this step is to increase the reliability of joint conclusions. Three guidelines are proposed, specifically tailored to address limitations 1-3:
* Guideline 1: Avoid narrative synthesis and aggregation of p-values to provide joint conclusions. * Guideline 2: Avoid IPD-MT due to its potential to provide biased or underpowered results. * Guideline 3: Use AD and IPD-S. - Step 4. Conduct exploratory analyses. The objective of this step is to identify experiment-level and participant-level moderators that may be behind the statistical heterogeneity commonly present in groups of SE replications. Three additional guidelines are proposed to address limitation 4: * Guideline 4: Use AD and IPD-S to identify experiment-level moderators. * Guideline 5: Use IPD-S to identify participant-level moderators. * Guideline 6: Acknowledge limitations of exploratory analyses.
Conclusion: The aggregation techniques used to analyze groups of replications should be justified in research articles. The statistical models and the raw data that used should also be transparently reported. This will increase the reliability and transparency of joint results. The proposed guidelines should ease this endeavor.
Tue 7 Jul Times are displayed in time zone: (UTC) Coordinated Universal Time change
|07:00 - 07:12|
|An Evidence-Based Inquiry into the Use of Grey Literature in Software EngineeringTechnical|
He ZhangNanjing University, Xin ZhouState Key Laboratory of Novel Software Technology, Software Institute, Nanjing University, Xin HuangState Key Laboratory of Novel Software Technology, Software Institute, Nanjing University, Huang HuangState Key Laboratory of Novel Software Technology, Software Institute, Nanjing University, Muhammad Ali BabarThe University of Adelaide
|07:12 - 07:20|
|An Extended Abstract of "Metamorphic Robustness Testing: Exposing Hidden Defects in Citation Statistics and Journal Impact Factors"J1|
|07:20 - 07:28|
|A Procedure and Guidelines for Analyzing Groups of Software Engineering ReplicationsJ1|
|07:28 - 07:31|
|An SLR-Tool: Search Process in PracticeDemo|
|07:31 - 07:37|
|Threats to Validity in Experimenting Mutation-Based Fault LocalizationNIER|
New Ideas and Emerging Results
|07:37 - 07:45|
|Methodological Principles for Reproducible Performance Evaluation in Cloud ComputingJ1|
Alessandro Vittorio PapadopoulosMälardalen University, Laurens VersluisVrije Universiteit Amsterdam, André BauerUniversity of Würzburg, Nikolas HerbstUniversity of Würzburg, Joakim von KistowskiUniversity of Würzburg, Ahmed Ali-EldinUMass Amherst, Cristina L. AbadEscuela Superior Politecnica del Litoral, Jose Nelson AmaralUniversity of Alberta, Petr TumaCharles University, Alexandru IosupVrije Universiteit Amsterdam
|07:45 - 07:53|
|Bayesian Data Analysis in Empirical Software Engineering ResearchJ1|
Carlo A. FuriaUniversità della Svizzera italiana (USI), Robert FeldtChalmers | University of Gothenburg, Blekinge Institute of Technology, Richard TorkarChalmers and the University of GothenburgDOI Pre-print