MSRBot: Using Bots to Answer Questions from Software Repositories (ICSE 2020 - Journal First)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Ahmad Abdellatif, Khaled Badran, Emad Shihab

Track

ICSE 2020 Journal First

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 8 Jul 2020 01:21 - 01:29 at Baekje - P10-Stack Overflow Chair(s): Emerson Murphy-Hill

Abstract

Software repositories contain a plethora of useful information that can be used to enhance software projects. Prior work has leveraged repository data to improve many aspects of the software development process, such as extracting requirements, identifying potentially defective code, and maintenance and evolution. However, in many cases, project stakeholders are not able to fully benefit from their software repositories due to the fact that they need special expertise to mine those repositories. Also, extracting and linking data from different types of repositories (e.g., source code control and bug repositories) require dedicated effort and time, even if the stakeholder has the expertise to perform such a task.

More recently, bots were proposed as means to help automate redundant development tasks and lower the barrier of entry for information extraction. Therefore, in this paper, we use bots to automate and ease the process of extracting useful information from software repositories. While it might seem at first that applying bots on software repositories is the same as using them to answer questions based on Stack Overflow posts, the reality is that there is a big difference between the two. One fundamental difference is the fact that bots that are trained on Stack Overflow data can provide general answers, and will never be able to answer project-specific questions such as "how many bugs were opened against my project today?''. Also, we would like to better understand how bots can be applied on software repository data and highlight what is and what is not achievable using bots on top of software repositories.

Therefore, our goal is to design and build a bot framework for software repositories and perform a case study to examine its efficiency and highlight the challenges facing our framework. The approach contains five main components, a user interaction component, meant to interact with the user; entity recognizer and intent extractor components, meant to process and analyze the user’s natural language input; a knowledge base component, that contains all of the data and information to be queried; and a response generator component, meant to generate a reply message that contains the query’s answer and return it to the user interaction component. To evaluate our framework, we add support for 15 of the most commonly asked questions by software practitioners mentioned in prior work. We then perform a case study with 12 participants using two open-source projects. In particular, we asked those participants to perform a set of tasks using the bot then evaluate it based on its replies. We examine the bot in terms of its effectiveness, efficiency, and accuracy and compare it to a baseline where the survey participants are asked to do the same tasks without using the bot. We also perform a post-survey interview with a subset of the survey participants to better understand the strengths and areas of improvements of the bot approach.

Our results indicate that bots are useful (as indicated by 90.0% of answers), efficient (as indicated by 84.17% of answers) and accurate (as indicated by 90.8% of tasks) in providing answers to some of the most common questions about software repositories. In comparison to the baseline, the bots significantly outperform the manual process of finding answers to their questions (the survey participants were only able to answer 25.2% of the questions correctly and took much longer to find their answers). Based on our post-survey interviews with the participants, we find that bots can be improved if they enable users to perform deep-dive analysis and help compensate for user errors, e.g., typos. Based on our results, we believe that applying bots on software repositories has the potential to transform the MSR field by significantly lowering the barrier to entry and making the extraction of useful information from software repositories as easy as chatting with a bot.

In addition to our findings, the paper provides the following contributions (1) To the best of our knowledge, this is the first study to use bots on software repositories. Also, our framework allows project stakeholders to extract repository information easily using natural language. (2) We perform an empirical study to evaluate our bot framework and compare it to a baseline. Moreover, we provide insights on areas where bot technology/frameworks still face challenges when applied to software repositories. (3) We make our framework implementation and datasets publicly available in an effort to accelerate future research in the area.

DOI

https://doi.org/10.1007/s10664-019-09788-5

Ahmad Abdellatif

Concordia University

Canada

Khaled Badran

Concordia University

Canada

Emad Shihab

Concordia University

Canada

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 8 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

01:05 - 02:05	P10-Stack OverflowJournal First / New Ideas and Emerging Results / Technical Papers at Baekje Chair(s): Emerson Murphy-Hill Google

01:05 8m Talk		What Do Programmers Discuss about Blockchain? A Case Study on the Use of Balanced LDA and the Reference Architecture of a Domain to Capture Online Discussions about Blockchain Platforms across Stack Exchange CommunitiesJ1 Journal First Zhiyuan Wan Zhejiang University, Xin Xia Monash University, Ahmed E. Hassan Queen's University
01:13 8m Talk		Bounties on Technical Q&A Sites: A Case Study of Stack Overflow BountiesJ1 Journal First Jiayuan Zhou Queen's University, Shaowei Wang Mississippi State University, Cor-Paul Bezemer University of Alberta, Canada, Ahmed E. Hassan Queen's University
01:21 8m Talk		MSRBot: Using Bots to Answer Questions from Software RepositoriesJ1 Journal First Ahmad Abdellatif Concordia University, Khaled Badran Concordia University, Emad Shihab Concordia University DOI
01:29 6m Talk		Why Will My Question Be Closed? NLP-Based Pre-Submission Predictions of Question Closing Reasons on Stack OverflowNIER New Ideas and Emerging Results Laszlo Toth University of Szeged, Hungary, Balázs Nagy University of Szeged, Hungary, László Vidács University of Szeged, Hungary, Tibor Gyimóthy University of Szeged, Hungary
01:35 12m Talk		Interpreting Cloud Computer Vision Pain-Points: A Mining Study of Stack OverflowTechnical Technical Papers Alex Cummaudo Applied Artificial Intelligence Institute, Deakin University, Rajesh Vasa Deakin University, Scott Barnett Deakin University, John Grundy Monash University, Mohamed Abdelrazek Deakin University