Successful cross-language clone detection could enable researchers and developers to create robust language migration tools, facilitate learning additional programming languages once one is mastered, and promote reuse of code snippets over a broader code base. However, identifying cross-language clones presents special challenges to the clone detection problem. A lack of common underlying representation between arbitrary languages means detecting clones requires one of the following solutions: 1) a static analysis framework replicated across each targeted language with annotations matching language features across all languages, or 2) a dynamic analysis framework that detects clones based on runtime behavior.
In this work, we demonstrate the feasibility of the latter solution, a dynamic analysis approach for cross-language clone detection. As an added challenge, we target a static typed language, Java, and a dynamic typed language, Python. As is done in prior clone detection work, we use input/output behavior to match clones, though we overcome limitations of prior work by amplifying the number of inputs and covering more data types; and as a result, achieve better clusters than prior attempts. Compared to HitoshiIO, a recent clone detection tool, SLACC retrieves 6x as many clusters and has higher precision (86.7% vs. 30.7%).
This is the first work to perform clone detection for dynamic typed languages (precision = 87.3%) and the first to perform clone detection across languages that lack a common underlying representation (precision = 94.1%). It provides a first step towards the larger goal of extensible and scalable language migration tools.
Sat 11 JulDisplayed time zone: (UTC) Coordinated Universal Time change
01:05 - 02:05 | P29-Android and Web TestingDemonstrations / Technical Papers / Software Engineering in Practice at Goguryeo Chair(s): Hironori Washizaki Waseda University | ||
01:05 12mTalk | SLACC: Simion-based Language Agnostic Code ClonesTechnical Technical Papers George Mathew North Carolina State University, Chris Parnin North Carolina State University, Kathryn Stolee North Carolina State University Pre-print | ||
01:17 8mTalk | Near-Duplicate Detection in Web App Model InferenceTechnical Technical Papers Rahulkrishna Yandrapally University of British Columbia, Canada, Andrea Stocco Università della Svizzera italiana, Ali Mesbah University of British Columbia Pre-print | ||
01:25 12mTalk | JSidentify: A Hybrid Framework for Detecting Plagiarism Among JavaScript Code in Online Mini GamesSEIP Software Engineering in Practice Qun Xia Tencent Inc., Zhongzhu Zhou , Zhihao Li Tencent Inc., Bin Xu Tencent Inc., Wei Zou Tencent Inc., Zishun Chen Tencent Inc., Huafeng Ma Tencent Inc., Gangqiang Liang Tencent Inc., Haochuan Lu Fudan University, Shiyu Guo Tencent Inc., Ting Xiong Tencent Inc., Yuetang Deng Tencent, Inc., Tao Xie Peking University | ||
01:37 12mTalk | Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep LearningTechnical Technical Papers Jieshan Chen Australian National University, Chunyang Chen Monash University, Zhenchang Xing Australia National University, Xiwei (Sherry) Xu Data 61, Liming Zhu CSIRO's Data61 and UNSW, Guoqiang Li Shanghai Jiao Tong University, Jinshui Wang School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China | ||
01:49 3mTalk | DroidMutator: An Effective Mutation Analysis Tool for Android ApplicationsDemo Demonstrations Jian Liu East China Normal University, Xusheng Xiao Case Western Reserve University, Lihua Xu New York University Shanghai, Liang Dou East China Normal University, Andy Podgurski Case Western University | ||
01:52 3mTalk | BigTest: Symbolic Execution Based Systematic Test Generation Tool for Apache SparkDemo Demonstrations Muhammad Ali Gulzar University of California, Los Angeles, Madan Musuvathi Microsoft Research, Miryung Kim University of California, Los Angeles |