Automating cyber risk assessment with public LLMs: an expert-validated framework and comparative analysis

Yükleniyor...
Küçük Resim

Tarih

2026-03-26

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Traditional cyber risk assessment methodologies face a critical dilemma: they are either quantitative yet static and context-agnostic (e.g., CVSS), or context-aware yet highly labor-intensive and subjective (e.g., NIST SP 800-30). Consequently, organizations struggle to scale risk assessment to match the pace of evolving threats. This paper presents an automated, context-aware risk assessment framework that leverages the reasoning capabilities of publicly available Large Language Models (LLMs) to operationalize expert knowledge. Rather than positioning the LLM as the final decision-maker, the framework decouples semantic interpretation from risk scoring authority through a transparent, deterministic Dynamic Metric Engine. Unlike complex closed box machine learning models, our approach anchors the AI's reasoning to this expert-validated metric schema, with weights derived using the Rank Order Centroid (ROC) method from a survey of 101 cybersecurity professionals. We evaluated the framework through a comparative study involving 15 diverse real-world vulnerability scenarios (C1-C15) and three supplementary sensitivity stress tests (C16-C18). The validation scenarios were independently assessed by a cohort of ten senior human experts and two state-of-the-art LLM agents (GPT-4o and Gemini 2.0 Flash). The results show that the LLM-driven agents achieve scoring consistency closely aligned with the human median (Pearson r ranging from 0.9390 to 0.9717, Spearman ρ from 0.8472 to 0.9276) against a highly reliable expert baseline (Cronbach's α =0.996), while reducing the assessment cycle time by more than 100× (averaging under 4 seconds per case vs. a human average of 6 minutes). Furthermore, a dedicated context sensitivity analysis (C13-C15) indicates that the framework adapts risk scores based on organizational context (e.g., SME vs. Critical Infrastructure) for identical technical vulnerabilities. Importantly, the system is designed not merely to replicate expert intuition, but to enforce bounded, policy-consistent risk evaluation under predefined governance constraints. Overall, these findings suggest that commercially available LLMs, when constrained by expert-validated metric schemas, can support reproducible, transparent, and real-time risk assessments.

Açıklama

Anahtar Kelimeler

Automated risk scoring, Cyber risk assessment, Generative AI, Human-AI comparison, Large Language Models (LLMs), Rank Order Centroid (ROC), Artificial intelligence, Automation, Critical infrastructures, Cybersecurity, Decision making, Learning systems, Risk analysis, Risk assessment, Risk management, Semantics, Cybe risk assessment, Language model, Large language model, Rank order centroid, Rank ordering, Risk scoring, Risks assessments, Sensitivity analysis, Internet

Kaynak

IEEE Access

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

14

Sayı

Künye

Ünal, N. M. & Çeliktaş, B. (2026). Automating cyber risk assessment with public LLMs: an expert-validated framework and comparative analysis. IEEE Access, 14, 47754-47778. doi:https://doi.org/10.1109/ACCESS.2026.3678044