Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer LearningJ1
Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based API extraction has demonstrated superior performance than rule-based methods in informal software texts that lack consistent writing forms and annotations. However, machine learning based methods have a significant overhead in preparing training data and effective features. Training a reliable machine learning based API extraction model for a library often requires several hundreds of manually labeled sentences mentioning this library’s APIs. The effort to prepare training data for hundreds of libraries would be prohibitive. Furthermore, it may also be difficult to prepare sufficient high-quality training data for APIs of some less frequently discussed libraries or frameworks. Another related challenge is to select effective features for a machine learning model to recognize a particular library’s APIs. Designers of a machine learning based API extraction model have to manually select the most effective features for different libraries’ APIs.
In our paper, we propose a multi-layer neural network-based architecture for API extraction. Our architecture automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features (e.g., API gazetteers) beyond the input texts. Our neural architecture is composed of the character-level convolutional neural network (CNN), word-level embeddings, and sentence-level Bi-directional Long Short-Term Memory (Bi-LSTM) network for automatically learning character-, word- and sentence-level features from input texts, respectively. We also propose to adopt transfer learning to adapt a source-library-trained model to a target-library, thus reducing the overhead of manual training-data labeling when the software text of multiple programming languages and libraries need to be processed.
We conduct extensive experiments with six libraries of four programming languages which support diverse functionalities and have different API-naming and API-mention characteristics. Our experiments involve three Python libraries (Pandas, NumPy and Matplotlib), one Java library (JDBC), one JavaScript library (React), and one C library (OpenGL). We manually label API mentions in 3600 Stack Overflow posts (600 for each library) for the experiments. Our experiments investigate the performance of our neural architecture for API extraction in informal software texts, the importance of different features, the effectiveness of transfer learning. Our results confirm not only the superior performance of our neural architecture than existing machine learning based methods for API extraction in informal software texts, but also the easy-to-deploy characteristic of our neural architecture.
Our paper makes the following four contributions:
Our work is the first one to consider not only the performance of machine learning based API extraction methods but also the easy deployment of such methods for the software text of multiple programming languages and libraries.
We propose a multi-layer neural architecture to automatically learn to extract effective features from the input texts for API extraction, thus removing the need for manual feature engineering as well as the dependence on features beyond the input texts.
We adopt transfer learning to reduce the overhead of manual labeling of the training data of a subject library. We evaluate the effectiveness of transfer learning across libraries and programming languages and analyze the factors that affect its effectiveness.
We conduct extensive experiments to evaluate our architecture as a whole as well its components. Our results reveal insights into the design of effective mechanisms for API extraction tasks.
Thu 9 JulDisplayed time zone: (UTC) Coordinated Universal Time change
01:05 - 02:05 | P16-Security and LearningTechnical Papers / Journal First at Baekje Chair(s): Lingming Zhang The University of Texas at Dallas | ||
01:05 12mTalk | Software Visualization and Deep Transfer Learning for Effective Software Defect PredictionTechnical Technical Papers Jinyin Chen College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Keke Hu College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China, Zhuangzhi Chen College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Qi Xuan Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China, Yi Liu Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Vladimir Filkov University of California at Davis, USA | ||
01:17 8mTalk | Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer LearningJ1 Journal First Suyu Ma Monash University, Zhenchang Xing Australia National University, Chunyang Chen Monash University, Cheng Chen PricewaterhouseCoopers Firm, Lizhen Qu Monash University, Guoqiang Li Shanghai Jiao Tong University | ||
01:25 12mTalk | How Does Misconfiguration of Analytic Services Compromise Mobile Privacy?Technical Technical Papers Xueling Zhang University of Texas at San Antonio, Xiaoyin Wang University of Texas at San Antonio, USA, Rocky Slavin University of Texas at San Antonio, Travis Breaux Carnegie Mellon University, Jianwei Niu University of Texas at San Antonio | ||
01:37 12mTalk | Securing UnSafe Rust Programs with XRustTechnical Technical Papers | ||
01:49 12mTalk | Is Rust Used Safely by Software Developers?Technical Technical Papers Ana Nora Evans University of Virginia, USA, Bradford Campbell University of Virginia, Mary Lou Soffa University of Virginia |