MSRBot: Using Bots to Answer Questions from Software RepositoriesJ1
Software repositories contain a plethora of useful information that can be used to enhance software projects. Prior work has leveraged repository data to improve many aspects of the software development process, such as extracting requirements, identifying potentially defective code, and maintenance and evolution. However, in many cases, project stakeholders are not able to fully benefit from their software repositories due to the fact that they need special expertise to mine those repositories. Also, extracting and linking data from different types of repositories (e.g., source code control and bug repositories) require dedicated effort and time, even if the stakeholder has the expertise to perform such a task.
More recently, bots were proposed as means to help automate redundant development tasks and lower the barrier of entry for information extraction. Therefore, in this paper, we use bots to automate and ease the process of extracting useful information from software repositories. While it might seem at first that applying bots on software repositories is the same as using them to answer questions based on Stack Overflow posts, the reality is that there is a big difference between the two. One fundamental difference is the fact that bots that are trained on Stack Overflow data can provide general answers, and will never be able to answer project-specific questions such as "how many bugs were opened against my project today?''. Also, we would like to better understand how bots can be applied on software repository data and highlight what is and what is not achievable using bots on top of software repositories.
Therefore, our goal is to design and build a bot framework for software repositories and perform a case study to examine its efficiency and highlight the challenges facing our framework. The approach contains five main components, a user interaction component, meant to interact with the user; entity recognizer and intent extractor components, meant to process and analyze the user’s natural language input; a knowledge base component, that contains all of the data and information to be queried; and a response generator component, meant to generate a reply message that contains the query’s answer and return it to the user interaction component. To evaluate our framework, we add support for 15 of the most commonly asked questions by software practitioners mentioned in prior work. We then perform a case study with 12 participants using two open-source projects. In particular, we asked those participants to perform a set of tasks using the bot then evaluate it based on its replies. We examine the bot in terms of its effectiveness, efficiency, and accuracy and compare it to a baseline where the survey participants are asked to do the same tasks without using the bot. We also perform a post-survey interview with a subset of the survey participants to better understand the strengths and areas of improvements of the bot approach.
Our results indicate that bots are useful (as indicated by 90.0% of answers), efficient (as indicated by 84.17% of answers) and accurate (as indicated by 90.8% of tasks) in providing answers to some of the most common questions about software repositories. In comparison to the baseline, the bots significantly outperform the manual process of finding answers to their questions (the survey participants were only able to answer 25.2% of the questions correctly and took much longer to find their answers). Based on our post-survey interviews with the participants, we find that bots can be improved if they enable users to perform deep-dive analysis and help compensate for user errors, e.g., typos. Based on our results, we believe that applying bots on software repositories has the potential to transform the MSR field by significantly lowering the barrier to entry and making the extraction of useful information from software repositories as easy as chatting with a bot.
In addition to our findings, the paper provides the following contributions (1) To the best of our knowledge, this is the first study to use bots on software repositories. Also, our framework allows project stakeholders to extract repository information easily using natural language. (2) We perform an empirical study to evaluate our bot framework and compare it to a baseline. Moreover, we provide insights on areas where bot technology/frameworks still face challenges when applied to software repositories. (3) We make our framework implementation and datasets publicly available in an effort to accelerate future research in the area.