Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning (ICSE 2020 - Journal First)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Suyu Ma, Zhenchang Xing, Chunyang Chen, Cheng Chen, Lizhen Qu, Guoqiang Li

Track

ICSE 2020 Journal First

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 9 Jul 2020 01:17 - 01:25 at Baekje - P16-Security and Learning Chair(s): Lingming Zhang

Abstract

Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based API extraction has demonstrated superior performance than rule-based methods in informal software texts that lack consistent writing forms and annotations. However, machine learning based methods have a significant overhead in preparing training data and effective features. Training a reliable machine learning based API extraction model for a library often requires several hundreds of manually labeled sentences mentioning this library’s APIs. The effort to prepare training data for hundreds of libraries would be prohibitive. Furthermore, it may also be difficult to prepare sufficient high-quality training data for APIs of some less frequently discussed libraries or frameworks. Another related challenge is to select effective features for a machine learning model to recognize a particular library’s APIs. Designers of a machine learning based API extraction model have to manually select the most effective features for different libraries’ APIs.

In our paper, we propose a multi-layer neural network-based architecture for API extraction. Our architecture automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features (e.g., API gazetteers) beyond the input texts. Our neural architecture is composed of the character-level convolutional neural network (CNN), word-level embeddings, and sentence-level Bi-directional Long Short-Term Memory (Bi-LSTM) network for automatically learning character-, word- and sentence-level features from input texts, respectively. We also propose to adopt transfer learning to adapt a source-library-trained model to a target-library, thus reducing the overhead of manual training-data labeling when the software text of multiple programming languages and libraries need to be processed.

We conduct extensive experiments with six libraries of four programming languages which support diverse functionalities and have different API-naming and API-mention characteristics. Our experiments involve three Python libraries (Pandas, NumPy and Matplotlib), one Java library (JDBC), one JavaScript library (React), and one C library (OpenGL). We manually label API mentions in 3600 Stack Overflow posts (600 for each library) for the experiments. Our experiments investigate the performance of our neural architecture for API extraction in informal software texts, the importance of different features, the effectiveness of transfer learning. Our results confirm not only the superior performance of our neural architecture than existing machine learning based methods for API extraction in informal software texts, but also the easy-to-deploy characteristic of our neural architecture.

Our paper makes the following four contributions:

Our work is the first one to consider not only the performance of machine learning based API extraction methods but also the easy deployment of such methods for the software text of multiple programming languages and libraries.

We propose a multi-layer neural architecture to automatically learn to extract effective features from the input texts for API extraction, thus removing the need for manual feature engineering as well as the dependence on features beyond the input texts.

We adopt transfer learning to reduce the overhead of manual labeling of the training data of a subject library. We evaluate the effectiveness of transfer learning across libraries and programming languages and analyze the factors that affect its effectiveness.

We conduct extensive experiments to evaluate our architecture as a whole as well its components. Our results reveal insights into the design of effective mechanisms for API extraction tasks.

Suyu Ma

Monash University

Zhenchang Xing

Australia National University

Australia

Chunyang Chen

Monash University

Australia

Cheng Chen

PricewaterhouseCoopers Firm

Lizhen Qu

Monash University

Guoqiang Li

Shanghai Jiao Tong University

China

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 9 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

01:05 - 02:05	P16-Security and LearningTechnical Papers / Journal First / Paper Presentations at Baekje Chair(s): Lingming Zhang The University of Texas at Dallas

01:05 12m Talk		Software Visualization and Deep Transfer Learning for Effective Software Defect PredictionTechnical Technical Papers Jinyin Chen College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Keke Hu College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China, Zhuangzhi Chen College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Qi Xuan Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China, Yi Liu Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Vladimir Filkov University of California at Davis, USA
01:17 8m Talk		Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer LearningJ1 Journal First Suyu Ma Monash University, Zhenchang Xing Australia National University, Chunyang Chen Monash University, Cheng Chen PricewaterhouseCoopers Firm, Lizhen Qu Monash University, Guoqiang Li Shanghai Jiao Tong University
01:25 12m Talk		How Does Misconfiguration of Analytic Services Compromise Mobile Privacy?Technical Technical Papers Xueling Zhang University of Texas at San Antonio, Xiaoyin Wang University of Texas at San Antonio, USA, Rocky Slavin University of Texas at San Antonio, Travis Breaux Carnegie Mellon University, Jianwei Niu University of Texas at San Antonio
01:37 12m Talk		Securing UnSafe Rust Programs with XRustTechnical Technical Papers Peiming Liu Texas A&M University, Gang Zhao Texas A&m University, Jeff Huang Texas A&M University
01:49 12m Talk		Is Rust Used Safely by Software Developers?Technical Technical Papers Ana Nora Evans University of Virginia, USA, Bradford Campbell University of Virginia, Mary Lou Soffa University of Virginia