Caspar: Extracting and Synthesizing User Stories of Problems from App Reviews
App reviews can provide important intelligence that app developers can apply to improve their offerings. Whereas previous research on review analysis has considered the easily extractable elements of an app review (such as topics and sentiment), it has largely ignored the more subtle - and potentially more informative - elements. Specifically, a user’s review of an app would often describe the user’s interactions with the app. These interactions, which we interpret as mini stories, are prominent in reviews with negative ratings.
In general, a story in an app review would contain at least two types of events: (1) user actions, indicative of use cases or user expectations and (2) associated app behaviors, indicative of problems that violate the user’s expectations. Being able to identify such stories would enable a developer in better maintaining and improving his or her app’s functionality and enhancing user experience.
To this end, we present CASPAR, a method for collecting and analyzing user-reported mini stories regarding app problems from app reviews. CASPAR abstracts event pairs from stories in reviews. By extending and applying natural language processing and deep learning techniques, CASPAR extracts ordered events from app reviews, classifies them as user actions or app problems, and conducts inference on action-problem event pairs. It builds and trains an inference model with the extracted event pairs to predict possible app problems for different use cases.
CASPAR discovers high-quality event pairs regarding app problems from reviews, and infers plausible app problems for use cases. We conduct two main evaluations. First, CASPAR classifies the events with an accuracy of 82.0% on manually labeled data. Second, relative to human evaluators, CASPAR extracts event pairs with 92.9% precision and 28.9% recall, and infers events with high plausibility. Our dataset and code will be released upon acceptance.