Mounir -

The rising rate of cyberattacks and dearth of skilled cybersecurity professionals require innovative solutions beyond traditional security measures. Large Language Models (LLMs), famed for their natural language processing abilities, present a viable way to improve cybersecurity defenses. However, despite LLMs’ exploration in various security applications, systematic research aligning LLMs’ contributions with established cybersecurity frameworks is lacking. To bridge this gap, this paper presents a systematic literature review aligned with the NIST Cybersecurity Framework (CSF 2.0) to gain a clear understanding of LLMs’ multifaceted contributions and uncover their overlooked potential, particularly in areas such as Awareness and Training within the NIST CSF 2.0 Protect function. Consequently, given the accessibility and privacy benefits of open source LLMs (OSLLMs), we have developed a benchmarking methodology using Multiple-Choice Question Answering (MCQA) to evaluate 21 state-of-the-art OSLLMs across two publicly available datasets. Our findings revealed that while larger models generally outdid smaller ones, medium-sized models achieved competitive results in specific cybersecurity areas, contesting the claim that model size alone dictates efficacy. These insights motivated a further investigation using optimized prompt engineering workflows through the DSPy framework. Our results reveal that advanced prompting techniques significantly boost the performance of smaller models, narrowing the performance gap with larger ones and broadening deployment possibilities. Furthermore, we introduced structural modifications and novel exit options to address position and forced-choice biases in standard MCQA datasets, further enhancing the reliability of OSLLM evaluations. Overall, the results of this study not only strengthen our understanding of OSLLMs’ aptitude in cybersecurity but also emphasize the impact of advanced prompt engineering and unbiased datasets in enhancing the capabilities of OSLLMs in cybersecurity.