除了重新训练或微调模型的方法外,数据集的构建也是关键环节之一。Yizheng C等人 [18]提出了一个新型漏洞源代码数据集DiverseVul,包含18,945个漏洞函数(涵盖150个CWE)和330,492个正常函数,所有样本均为C/C++代码。
此外,他们讨论了11种不同的深度学习架构,并得出结论,尽管大模型取得了一定的成功,但模型在漏洞检测方面仍面临高误报率、低F1分数和难以检测复杂CWE等挑战。Norbert T等人 [19]生成了一个包含112,000个含漏洞的C代码的数据集,详细标注了漏洞信息(CWE编号、位置和函数名称),该数据集中的所有代码均由GPT-3.5生成。Zeyu G等人 [20]提出了一个综合性漏洞基准数据集VulBench,包含来自CTF挑战和实际应用的高质量数据,并为每个漏洞函数提供详细的漏洞类型和成因注释。
向上滑动,查看所有参考文献
1.Anton Cheshkov, Pavel Zadorozhny, and Rodion Levichev. Evaluation of chatgpt model for vulnerability detection. arXiv preprint arXiv:2304.07232, 2023.
2.Moumita Das Purba, Arpita Ghosh, Benjamin J. Radford, and Bill Chu. Software vulnerability detection using large language models. In 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW), pages 112–119, 2023.
3.Marwan Omar. Detecting software vulnerabilities using language models. arXiv preprint arXiv:2302.11773, 2023.
4.Rasmus Ingemann Tuffveson Jensen, Vali Tawosi, and Salwa Alamir. Software vulnerability and functionality assessment using llms. arXiv preprint arXiv:2403.08429, 2024.
5.Alexey Shestov, Rodion Levichev, Ravil Mussabayev, and Anton Cheshkov. Finetuning large language models for vulnerability detection. arXiv preprint arXiv:2401.17010, 2024.
6.Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. The hitchhiker’s guide to program analysis: A journey with large language models. arXiv preprint arXiv:2308.00245, 2023.
7.Jin Wang, Zishan Huang, Hengli Liu, Nianyi Yang, and Yinhao Xiao. Defecthunter: A novel llm-driven boosted-conformer-based code vulnerability detection mechanism. arXiv preprint arXiv:2309.15324, 2023.
8.Chenyuan Zhang, Hao Liu, Jiutian Zeng, Kejing Yang, Yuhong Li, and Hui Li. Prompt-enhanced software vulnerability detection using chatgpt. arXiv preprint arXiv:2308.12697, 2023.
9.Atieh Bakhshandeh, Abdalsamad Keramatfar, Amir Norouzi, and Mohammad Mahdi Chekidehkhoun. Using chatgpt as a static application security testing tool. arXiv preprint arXiv:2308.14434, 2023.
10.Noble Saji Mathews, Yelizaveta Brus, Yousra Aafer, Mei Nagappan, and Shane McIntosh. Llbezpeky: Leveraging large language models for vulnerability detection. arXiv preprint arXiv:2401.01269, 2024.
11.Sihao Hu, Tiansheng Huang, Fatih Ilhan, Selim Furkan Tekin, and Ling Liu. Large language model-powered smart contract vulnerability detection: New perspectives. arXiv preprint arXiv:2310.01152, 2023.
12.Zhihong Liu, Qing Liao, Wenchao Gu, and Cuiyun Gao. Software vulnerability detection with gpt and in-context learning. In 2023 8th International Conference on Data Science in Cyberspace (DSC), pages 229–236, 2023.
13.Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis. arXiv preprint arXiv:2308.03314, 2023.
14.Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Wei Ma, Lyuye Zhang, Miaolei Shi, and Yang Liu. Llm4vuln: A unified evaluation framework for decoupling and enhancing llms’ vulnerability reasoning. arXiv preprint arXiv:2401.16185, 2024.
15.Zhenyu Mao, Jialong Li, Munan Li, and Kenji Tei. Multi-role consensus through llms discussions for vulnerability detection. arXiv preprint arXiv:2403.14274, 2024.
16.Tianyu Chen, Lin Li, Liuchuan Zhu, Zongyang Li, Guangtai Liang, Ding Li, Qianxiang Wang, and Tao Xie. Vullibgen: Identifying vulnerable third-party libraries via generative pre-trained model. arXiv preprint arXiv:2308.04662, 2023.
17.Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, and Wenhai Wang. How chatgpt is solving vulnerability management problem. arXiv preprint arXiv:2311.06530, 2023.
18.Yizheng Chen, Zhoujie Ding, Lamya Alowain, Xinyun Chen, and David Wagner. Diversevul: A new vulnerable source code dataset for deep learning based vulnerability detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, RAID ’23, page 654–668, New York, NY, USA, 2023. Association for Computing Machinery.
19.Norbert Tihanyi, Tamas Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C. Cordeiro, and Vasileios Mavroeidis. The formai dataset: Generative ai in software security through the lens of formal verification. In Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2023, page 33–43, New York, NY, USA, 2023. Association for Computing Machinery.
20.Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, and Chao Zhang. How far have we gone in vulnerability detection using large language models. arXiv preprint arXiv:2311.12420, 2023.
“码”上阅读
【山石说AI】全系列文章
推荐站内搜索:最好用的开发软件、免费开源系统、渗透测试工具云盘下载、最新渗透测试资料、最新黑客工具下载……
还没有评论,来说两句吧...