Near-Duplicate Detection in Web App Model InferenceTechnical
Automated web testing techniques infer models from a given web app, which are used for test generation. From a testing viewpoint, such an inferred model should contain the minimal set of states that are distinct, yet, adequately cover the app’s main functionalities. In practice, models inferred automatically are affected by near-duplicates, i.e., replicas of the same functional webpage differing only by small insignificant changes. We present the first study of near-duplicate detection algorithms used in within app model inference. We first characterize functional near-duplicates by classifying a random sample of state-pairs, from 493k pairs of webpages obtained from over 6,000 websites, into three categories, namely clone, near-duplicate, and distinct. We systematically compute thresholds that define the boundaries of these categories for each detection technique. We then use these thresholds to evaluate 10 near-duplicate detection techniques from three different domains, namely, information retrieval, web testing, and computer vision on nine open-source web apps. Our study highlights the challenges posed in automatically inferring a model for any given web app. Our findings show that even with the best thresholds, no algorithm is able to accurately detect all functional near-duplicates within apps, without sacrificing coverage.
Sat 11 JulDisplayed time zone: (UTC) Coordinated Universal Time change
01:05 - 02:05 | P29-Android and Web TestingDemonstrations / Technical Papers / Software Engineering in Practice at Goguryeo Chair(s): Hironori Washizaki Waseda University | ||
01:05 12mTalk | SLACC: Simion-based Language Agnostic Code ClonesTechnical Technical Papers George Mathew North Carolina State University, Chris Parnin North Carolina State University, Kathryn Stolee North Carolina State University Pre-print | ||
01:17 8mTalk | Near-Duplicate Detection in Web App Model InferenceTechnical Technical Papers Rahulkrishna Yandrapally University of British Columbia, Canada, Andrea Stocco Università della Svizzera italiana, Ali Mesbah University of British Columbia Pre-print | ||
01:25 12mTalk | JSidentify: A Hybrid Framework for Detecting Plagiarism Among JavaScript Code in Online Mini GamesSEIP Software Engineering in Practice Qun Xia Tencent Inc., Zhongzhu Zhou , Zhihao Li Tencent Inc., Bin Xu Tencent Inc., Wei Zou Tencent Inc., Zishun Chen Tencent Inc., Huafeng Ma Tencent Inc., Gangqiang Liang Tencent Inc., Haochuan Lu Fudan University, Shiyu Guo Tencent Inc., Ting Xiong Tencent Inc., Yuetang Deng Tencent, Inc., Tao Xie Peking University | ||
01:37 12mTalk | Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep LearningTechnical Technical Papers Jieshan Chen Australian National University, Chunyang Chen Monash University, Zhenchang Xing Australia National University, Xiwei (Sherry) Xu Data 61, Liming Zhu CSIRO's Data61 and UNSW, Guoqiang Li Shanghai Jiao Tong University, Jinshui Wang School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China | ||
01:49 3mTalk | DroidMutator: An Effective Mutation Analysis Tool for Android ApplicationsDemo Demonstrations Jian Liu East China Normal University, Xusheng Xiao Case Western Reserve University, Lihua Xu New York University Shanghai, Liang Dou East China Normal University, Andy Podgurski Case Western University | ||
01:52 3mTalk | BigTest: Symbolic Execution Based Systematic Test Generation Tool for Apache SparkDemo Demonstrations Muhammad Ali Gulzar University of California, Los Angeles, Madan Musuvathi Microsoft Research, Miryung Kim University of California, Los Angeles |