Posit: Simultaneously Tagging Natural and Programming LanguagesTechnical
Software developers use a mix of source code and natural language text to communicate with each other: and Developer mailing lists abound with this mixed text. Tagging this mixed text is essential for making progress on two seminal software engineering problems - traceability, and reuse via precise extraction of code snippets from mixed text. In this paper, we borrow code-switching techniques from Natural Language Processing and adapt them to apply to mixed text to solve two problems: language identification and token tagging. Our technique, Posit, simultaneously provides abstract syntax tree tags for source code tokens, part-of-speech tags for natural language words, and predicts the source language of a token on mixed text. To realize Posit, we trained a biLSTM network with a Conditional Random Field output layer using abstract syntax tree tags from the CLANG compiler and part-of-speech tags from the Standard Stanford part-of-speech tagger. Posit improves the state-of-the-art on language identification by 10.6% and PoS/AST tagging by 23.7% in accuracy.
POSIT Slides (slides_icse20_posit.pdf) | 3.25MiB |
Tue 7 Jul Times are displayed in time zone: (UTC) Coordinated Universal Time change
15:00 - 16:00: A3-Code SummarizationPaper Presentations / Technical Papers / New Ideas and Emerging Results at Silla Chair(s): Shaohua WangNew Jersey Institute of Technology, USA | |||
15:00 - 15:12 Talk | Posit: Simultaneously Tagging Natural and Programming LanguagesTechnical Technical Papers Profir-Petru PârțachiUniversity College London, Santanu Kumar DashUniversity College London, UK, Christoph TreudeThe University of Adelaide, Earl T. BarrUniversity College London, UK Pre-print Media Attached File Attached | ||
15:12 - 15:24 Talk | CPC: Automatically Classifying and Propagating Natural Language Comments via Program AnalysisTechnical Technical Papers Juan ZhaiRutgers University, Xiangzhe XuNanjing University, Yu ShiPurdue University, Guanhong TaoPurdue University, Minxue PanNanjing University, Shiqing MaRutgers University, Lei XuNational Key Laboratory for Novel Software Technology, Nanjing University, Weifeng ZhangNanjing University of Posts and Telecommunications, Lin TanPurdue University, Xiangyu ZhangPurdue University | ||
15:24 - 15:36 Talk | Suggesting Natural Method Names to Check Name ConsistenciesTechnical Technical Papers Son NguyenThe University of Texas at Dallas, Hung Phan, Trinh LeUniversity of Engineering and Technology, Tien N. NguyenUniversity of Texas at Dallas Pre-print | ||
15:36 - 15:42 Talk | Where should I comment my code? A dataset and model for predicting locations that need commentsNIER New Ideas and Emerging Results Annie LouisUniversity of Edinburgh, Santanu Kumar DashUniversity College London, UK, Earl T. BarrUniversity College London, UK, Michael D. ErnstUniversity of Washington, USA, Charles SuttonGoogle Research | ||
15:42 - 15:54 Talk | Retrieval-based Neural Source Code SummarizationTechnical Technical Papers Jian ZhangBeihang University, Xu WangBeihang University, Hongyu ZhangUniversity of Newcastle, Australia, Hailong SunBeihang University, Xudong LiuBeihang University Pre-print | ||
15:54 - 16:00 Talk | The Dual Channel HypothesisNIER New Ideas and Emerging Results Casey CasalnuovoUniversity of California at Davis, USA, Earl T. BarrUniversity College London, UK, Santanu Kumar DashUniversity College London, UK, Prem DevanbuUniversity of California, Emily MorganUniversity of California, Davis |