Automated Identification of Libraries from Vulnerability Data (ICSE 2020 - Software Engineering in Practice)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Chen Yang, Andrew Santosa, Asankhaya Sharma, David Lo

Track

ICSE 2020 Software Engineering in Practice

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 9 Jul 2020 00:12 - 00:24 at Baekje - P13-Security Chair(s): Joshua Garcia

Abstract

Software Composition Analysis (SCA) has gained traction in recent years with a number of commercial offerings from various companies. SCA involves vulnerability curation process where a group of security researchers, using various data sources, populate a database of open-source library vulnerabilities, which is used by a scanner to inform the end users of vulnerable libraries used by their applications. One of the data sources used is the National Vulnerability Database (NVD). The key challenge faced by the security researchers here is in figuring out which libraries are related to each of the reported vulnerability in NVD. In this article, we report our design and implementation of a machine learning system to help identify the libraries related to each vulnerability in NVD.

The problem is that of extreme multi-label learning (XML), and we developed our system using the state-of-the-art FastXML algorithm. Our system is iteratively executed, improving the performance of the model over time. At the time of writing, it achieves F1@1 score of 0.53 with average F1@k score for k=1, 2, 3 of 0.51 (F1@k is the harmonic mean of precision@k and recall@k). It has been deployed in Veracode as part of a machine learning system that helps the security researchers identify the likelihood of web data items to be vulnerability-related. In addition, we present evaluation results of our feature engineering and the FastXML tree number used. Our work formulates for the first time library name identification from NVD data as XML and it is also the first attempt at solving it in a complete production system.

Link to Preprint

http://asankhaya.github.io/pdf/Automated-Identification-of-Libraries-from-Vulnerability-Data.pdf

Chen Yang

Veracode, Inc.

Andrew Santosa

Veracode, Inc.

Asankhaya Sharma

Veracode, Inc.

Singapore

David Lo

Singapore Management University

Singapore

Automated Identification of Libraries from Vulnerability Data

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 9 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

00:00 - 01:00	P13-SecurityTechnical Papers / Software Engineering in Practice at Baekje Chair(s): Joshua Garcia University of California, Irvine

00:00 12m Talk		Burn After Reading: A Shadow Stack with Microsecond-level Runtime Rerandomization for Protecting Return AddressesTechnical Technical Papers Changwei Zou UNSW Sydney, Jingling Xue UNSW Sydney
00:12 12m Talk		Automated Identification of Libraries from Vulnerability DataSEIP Software Engineering in Practice Chen Yang Veracode, Inc., Andrew Santosa Veracode, Inc., Asankhaya Sharma Veracode, Inc., David Lo Singapore Management University Pre-print Media Attached
00:24 12m Talk		Unsuccessful Story about Few Shot Malware-Family Classification and Siamese Network to the RescueTechnical Technical Papers Yude Bai Tianjin University, Zhenchang Xing Australia National University, Xiaohong Li TianJin University, Zhiyong Feng Tianjin University, Duoyuan Ma Tianjin University
00:36 12m Talk		SpecuSym: Speculative Symbolic Execution for Cache Timing Leak DetectionTechnical Technical Papers Shengjian Guo Baidu X-Lab, Yueqi Chen The Pennsylvania State University, Peng Li Baidu X-Lab, Yueqiang Cheng Baidu Security, Huibo Wang Baidu X-Lab, Meng Wu Ant Financial, Zhiqiang Zuo Nanjing University, China
00:48 12m Talk		Building and Maintaining a Third-Party Library Supply Chain for Productive and Secure SGX Enclave DevelopmentSEIP Software Engineering in Practice Pei Wang Baidu X-Lab, Yu Ding Baidu X-Lab, Mingshen Sun Baidu X-Lab, Huibo Wang Baidu X-Lab, Tongxin Li Baidu X-Lab, Rundong Zhou Baidu X-Lab, Zhaofeng Chen , Yiming Jing Baidu X-Lab