A Large-Scale Empirical Study on Vulnerability Distribution within Projects and the Lessons Learned

ICSE 2020, download

Bingchang Liu , Guozhu Meng , Wei Zou , Qi Gong , Feng Li , Min Lin , Dandan Sun , Wei Huo , Chao Zhang .

Abstract

The number of vulnerabilities increases rapidly in recent years, due to advances in vulnerability discovery solutions. It enables a thorough analysis on the vulnerability distribution and provides support for correlation analysis and prediction of vulnerabilities. Previous research either focuses on analyzing bugs rather than vulnerabilities, or only studies general vulnerability distribution among projects rather than the distribution within each project. In this paper, we collected a large vulnerability dataset, consist- ing of all known vulnerabilities associated with five representative open source projects, by utilizing automated crawlers and spending months of manual efforts. We then analyzed the vulnerability distribution within each project over four dimensions, including files, functions, vulnerability types and responsible developers. Based on the results analysis, we presented 12 practical insights on the distribution of vulnerabilities. Finally, we applied such insights on several vulnerability discovery solutions (including static analysis and dynamic fuzzing), and helped them find 10 zero-day vulnerabilities in target projects, showing that our insights are useful.