A Procedure and Guidelines for Analyzing Groups of Software Engineering Replications (ICSE 2020 - Journal First)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Adrian Santos Parrilla, Sira Vegas, Markku Oivo, Natalia Juristo

Track

ICSE 2020 Journal First

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 7 Jul 2020 07:20 - 07:28 at Baekje - I1-Metastudies Chair(s): Michael Vierhauser

Abstract

Context: With the aim of increasing the reliability and generalizability of individual experiment results, software engineering (SE) researchers from different groups and institutions typically collaborate to perform groups of experiments by means of replication (i.e., conduct groups of replications). However, disparate aggregation techniques are being applied to analyze groups of replications: narrative synthesis, aggregated data, mega-trial individual participant data (IPD-MT), stratified individual participant data (IPD-S), and aggregation of p-values. The application of unsuitable techniques to aggregate replication results may undermine the potential of groups of replications to provide in-depth insights from experiment results.

Objectives: Provide an analysis procedure with a set of embedded guidelines to aggregate the results of groups of SE replications.

Method: First, we compare the characteristics of groups of replications for SE and other mature experimental disciplines such as medicine and pharmacology and identify their differences. Then, we identify four major limitations regarding joint data analysis practices in groups of SE replications: - Limitation 1: Narrative synthesis and aggregation of p-values are used in 53% of groups of SE replications. - Limitation 2: IPD-MT is used in 33% of groups of SE replications. - Limitation 3: AD is used in 38% of groups of SE replications. - Limitation 4: SE researchers rarely acknowledge the limitations of the exploratory analyses that they undertake for identifying moderators.

Next, we study the guidelines provided in mature experimental disciplines to analyze groups of replications. With all this, we develop an analysis procedure with a set of embedded guidelines specifically tailored to the analysis of groups of SE replications. We apply the proposed analysis procedure to a representative group of SE replications to illustrate its use.

Results: We propose the adoption of a four-step procedure to analyze groups of SE replications: - Step 1. Describe participants. The objectives of this step are: describe the population to which the results should be generalized, and suggest plausible sources of heterogeneity that may arise when providing joint results. - Step 2. Analyze individual replications. The objectives of this step are: provide descriptive statistics to ease the incorporation of results into prospective studies, identify patterns across replication results, and ensure that statistical heterogeneity is not introduced by the different methods used to analyze the replications. - Step 3. Aggregate the results. The objective of this step is to increase the reliability of joint conclusions. Three guidelines are proposed, specifically tailored to address limitations 1-3:
* Guideline 1: Avoid narrative synthesis and aggregation of p-values to provide joint conclusions. * Guideline 2: Avoid IPD-MT due to its potential to provide biased or underpowered results. * Guideline 3: Use AD and IPD-S. - Step 4. Conduct exploratory analyses. The objective of this step is to identify experiment-level and participant-level moderators that may be behind the statistical heterogeneity commonly present in groups of SE replications. Three additional guidelines are proposed to address limitation 4: * Guideline 4: Use AD and IPD-S to identify experiment-level moderators. * Guideline 5: Use IPD-S to identify participant-level moderators. * Guideline 6: Acknowledge limitations of exploratory analyses.

Conclusion: The aggregation techniques used to analyze groups of replications should be justified in research articles. The statistical models and the raw data that used should also be transparently reported. This will increase the reliability and transparency of joint results. The proposed guidelines should ease this endeavor.

Adrian Santos Parrilla

University of Oulu

Sira Vegas

Universidad Politecnica de Madrid

Spain

Markku Oivo