Write a Blog >>
ICSE 2020
Wed 24 June - Thu 16 July 2020
Thu 9 Jul 2020 00:12 - 00:24 at Baekje - P13-Security Chair(s): Joshua Garcia

Software Composition Analysis (SCA) has gained traction in recent years with a number of commercial offerings from various companies. SCA involves vulnerability curation process where a group of security researchers, using various data sources, populate a database of open-source library vulnerabilities, which is used by a scanner to inform the end users of vulnerable libraries used by their applications. One of the data sources used is the National Vulnerability Database (NVD). The key challenge faced by the security researchers here is in figuring out which libraries are related to each of the reported vulnerability in NVD. In this article, we report our design and implementation of a machine learning system to help identify the libraries related to each vulnerability in NVD.

The problem is that of extreme multi-label learning (XML), and we developed our system using the state-of-the-art FastXML algorithm. Our system is iteratively executed, improving the performance of the model over time. At the time of writing, it achieves F1@1 score of 0.53 with average F1@k score for k=1, 2, 3 of 0.51 (F1@k is the harmonic mean of precision@k and recall@k). It has been deployed in Veracode as part of a machine learning system that helps the security researchers identify the likelihood of web data items to be vulnerability-related. In addition, we present evaluation results of our feature engineering and the FastXML tree number used. Our work formulates for the first time library name identification from NVD data as XML and it is also the first attempt at solving it in a complete production system.

Thu 9 Jul
Times are displayed in time zone: (UTC) Coordinated Universal Time change

00:00 - 01:00: P13-SecurityPaper Presentations / Technical Papers / Software Engineering in Practice at Baekje
Chair(s): Joshua GarciaUniversity of California, Irvine
00:00 - 00:12
Burn After Reading: A Shadow Stack with Microsecond-level Runtime Rerandomization for Protecting Return AddressesTechnicalArtifact Available
Technical Papers
Changwei ZouUNSW Sydney, Jingling XueUNSW Sydney
00:12 - 00:24
Automated Identification of Libraries from Vulnerability DataSEIP
Software Engineering in Practice
Chen YangVeracode, Inc., Andrew SantosaVeracode, Inc., Asankhaya SharmaVeracode, Inc., David LoSingapore Management University
Pre-print Media Attached
00:24 - 00:36
Unsuccessful Story about Few Shot Malware-Family Classification and Siamese Network to the RescueTechnical
Technical Papers
Yude BaiTianjin University, Zhenchang XingAustralia National University, Li XiaohongTianJin University, Zhiyong FengTianjin University, Duoyuan MaTianjin University
00:36 - 00:48
SpecuSym: Speculative Symbolic Execution for Cache Timing Leak DetectionTechnical
Technical Papers
Shengjian GuoBaidu X-Lab, Yueqi ChenThe Pennsylvania State University, Peng LiBaidu X-Lab, Yueqiang ChengBaidu Security, Huibo WangBaidu X-Lab, Meng WuAnt Financial, Zhiqiang ZuoNanjing University, China
00:48 - 01:00
Building and Maintaining a Third-Party Library Supply Chain for Productive and Secure SGX Enclave DevelopmentSEIP
Software Engineering in Practice
Pei WangBaidu X-Lab, Yu DingBaidu X-Lab, Mingshen SunBaidu X-Lab, Huibo WangBaidu X-Lab, Tongxin LiBaidu X-Lab, Rundong ZhouBaidu X-Lab, Zhaofeng Chen, Yiming JingBaidu X-Lab