Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Thu 9 Jul 2020 07:46 - 07:54 at Silla - I15-Ecosystems 1 Chair(s): Raula Gaikovina Kula

The continuous contributions made by long time contributors (LTCs) are a key factor enabling open source software (OSS) projects to be successful and survival. We study GITHUB as it has a large number of OSS projects and millions of contributors, which enables the study of the transition from newcomers to LTCs. In this paper [1], we investigate whether we can effectively predict newcomers in OSS projects to be LTCs based on their activity data that is collected from GITHUB . We collect GITHUB data from GHTorrent, a mirror of GITHUB data. We select the most popular 917 projects, which contain 75,046 contributors. We determine a developer as a LTC of a project if the time interval between his/her first and last commit in the project is larger than a certain time T. In our experiment, we use three different settings on the time interval: 1, 2, and 3 years. There are 9,238, 3,968, and 1,577 contributors who become LTCs of a project in three settings of time interval, respectively.

To build a prediction model, we extract many features from the activities of developers on GITHUB , which group into five dimensions: developer profile, repository profile, developer monthly activity, repository monthly activity, and collaboration network. We apply several classifiers including naive Bayes, SVM, decision tree, kNN and random forest. We find that random forest classifier achieves the best performance with AUCs of more than 0.75 in all three settings of time interval for LTCs. We also investigate the most important features that differentiate newcomers who become LTCs from newcomers who stay in the projects for a short time. Finally, we provide several implications for action based on our analysis results to help OSS projects retain newcomers.

Below are the main contributions of this paper:

  1. We build a prediction model based on a total of 63 features from a developer’s first month activities in GITHUB to determine whether a newcomer will become a LTC in a GITHUB project. We conduct an experiment on a total of 75,046 developers from 917 projects. The results show that our approach can effectively predict whether a newcomer will become a LTC soon after he/she submits his/her first commit to the project.

  2. We investigate the most important characteristics that impact a newcomer being a LTC. We find that the number of a contributor’s followers when he/she joins the project is the most important feature in all time interval settings.

Thu 9 Jul
Times are displayed in time zone: (UTC) Coordinated Universal Time change

07:00 - 08:00: Paper Presentations - I15-Ecosystems 1 at Silla
Chair(s): Raula Gaikovina KulaNAIST
icse-2020-papers07:00 - 07:12
Wanwangying MaNanjing University, Lin ChenNanjing University, Xiangyu ZhangPurdue University, Yang FengNanjing University, Zhaogui XuNanjing University, China, Zhifei ChenHuawei, Yuming ZhouNanjing University, Baowen XuNanjing University
icse-2020-Journal-First07:12 - 07:20
Agus SulistyaTelkom Institute of Technology Surabaya, Gede Artha Azriadi PranaSingapore Management University, Abhishek Sharma Singapore Management University, Singapore, David LoSingapore Management University, Christoph TreudeThe University of Adelaide
icse-2020-Software-Engineering-in-Practice07:20 - 07:38
Frances PaulischSiemens Healthineers, Arun AzhakesanSiemens Healthineers
icse-2020-Journal-First07:38 - 07:46
Hugo AndradeChalmers University of Technology, Jan SchroederChalmers | University of Gothenburg, Ivica CrnkovicChalmers | University of Gothenburg
icse-2020-Journal-First07:46 - 07:54
Lingfeng Bao Zhejiang University, Xin XiaMonash University, David LoSingapore Management University, Gail MurphyUniversity of British Columbia