4 sonuçlar
Arama Sonuçları
Listeleniyor 1 - 4 / 4
Yayın TURSpider: a Turkish Text-to-SQL dataset and LLM-based study(Institute of Electrical and Electronics Engineers Inc., 2024-11-25) Kanburoğlu, Ali Buğra; Tek, Faik BorayThis paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.Yayın Text-to-SQL: a methodical review of challenges and models(TÜBİTAK, 2024-05-20) Kanburoğlu, Ali Buğra; Tek, Faik BorayThis survey focuses on Text-to-SQL, automated translation of natural language queries into SQL queries. Initially, we describe the problem and its main challenges. Then, by following the PRISMA systematic review methodology, we survey the existing Text-to-SQL review papers in the literature. We apply the same method to extract proposed Text-to-SQL models and classify them with respect to used evaluation metrics and benchmarks. We highlight the accuracies achieved by various models on Text-to-SQL datasets and discuss execution-guided evaluation strategies. We present insights into model training times and implementations of different models. We also explore the availability of Text-to-SQL datasets in non-English languages. Additionally, we focus on large language model (LLM) based approaches for the Text-to-SQL task, where we examine LLM-based studies in the literature and subsequently evaluate the LLMs on the cross-domain Spider dataset. Finally, we conclude with a discussion of future directions for Text-to-SQL research, identifying potential areas of improvement and advancements in this field.Yayın Automating cyber risk assessment with public LLMs: an expert-validated framework and comparative analysis(Institute of Electrical and Electronics Engineers Inc., 2026-03-26) Ünal, Nezih Mahmut; Çeliktaş, BarışTraditional cyber risk assessment methodologies face a critical dilemma: they are either quantitative yet static and context-agnostic (e.g., CVSS), or context-aware yet highly labor-intensive and subjective (e.g., NIST SP 800-30). Consequently, organizations struggle to scale risk assessment to match the pace of evolving threats. This paper presents an automated, context-aware risk assessment framework that leverages the reasoning capabilities of publicly available Large Language Models (LLMs) to operationalize expert knowledge. Rather than positioning the LLM as the final decision-maker, the framework decouples semantic interpretation from risk scoring authority through a transparent, deterministic Dynamic Metric Engine. Unlike complex closed box machine learning models, our approach anchors the AI's reasoning to this expert-validated metric schema, with weights derived using the Rank Order Centroid (ROC) method from a survey of 101 cybersecurity professionals. We evaluated the framework through a comparative study involving 15 diverse real-world vulnerability scenarios (C1-C15) and three supplementary sensitivity stress tests (C16-C18). The validation scenarios were independently assessed by a cohort of ten senior human experts and two state-of-the-art LLM agents (GPT-4o and Gemini 2.0 Flash). The results show that the LLM-driven agents achieve scoring consistency closely aligned with the human median (Pearson r ranging from 0.9390 to 0.9717, Spearman ρ from 0.8472 to 0.9276) against a highly reliable expert baseline (Cronbach's α =0.996), while reducing the assessment cycle time by more than 100× (averaging under 4 seconds per case vs. a human average of 6 minutes). Furthermore, a dedicated context sensitivity analysis (C13-C15) indicates that the framework adapts risk scores based on organizational context (e.g., SME vs. Critical Infrastructure) for identical technical vulnerabilities. Importantly, the system is designed not merely to replicate expert intuition, but to enforce bounded, policy-consistent risk evaluation under predefined governance constraints. Overall, these findings suggest that commercially available LLMs, when constrained by expert-validated metric schemas, can support reproducible, transparent, and real-time risk assessments.Yayın Adaptive incident escalation in SOCs via AI-driven skill-aware assignment and tier optimization(Institute of Electrical and Electronics Engineers Inc., 2026-04-15) Abuaziz, Ahmed; Çeliktaş, BarışModern Security Operations Centers (SOCs) face significant operational bottlenecks driven by escalating alert volumes, increasingly sophisticated cyberattack vectors, and chronic imbalances in analyst workloads. Conventional rule-based escalation models frequently fail to account for the multi-dimensional nature of incident characteristics, the nuances of analyst expertise, and fluctuating operational demands. This study proposes a comprehensive AI-driven framework for intelligent incident assignment and workload optimization. The framework introduces five primary contributions: 1) a multi-factor scoring model that integrates severity and complexity metrics with dynamic workload balancing; 2) two novel optimization algorithms, Quantile-Targeted Normality-Regularized Optimization (QT-NRO) and Joint Optimization of Weights and Thresholds (JOWT), to calibrate scoring coefficients against target analyst utilization; 3) a Large Language Model (LLM) engine leveraging Retrieval-Augmented Generation (RAG) for semantic alignment between incident requirements and analyst expertise; 4) an Adaptive Capacity Zoning mechanism for dynamic workload management; and 5) a novel RAG Relevance Score metric—a pre-resolution, semantic alignment indicator that quantifies analyst-incident assignment quality independently of resolution time, addressing a fundamental limitation of traditional temporal metrics such as Mean Time to Resolution (MTTR) and providing a reusable benchmark applicable to any skill-aware assignment system. In addition, the framework incorporates a feedback-based continuous learning mechanism that utilizes historical resolution data to inform future assignments. An experimental evaluation using 10,021 real-world incidents from Microsoft Defender demonstrates that the JOWT algorithm achieves a tier distribution alignment within 0.8% of targets. LLM-enhanced semantic matching yields improvements between 26.7% and 126.8% in skill alignment across both normal-load and high-load evaluations, while simulations indicate a 31.8% reduction in MTTR. These results substantiate the efficacy of AI-driven methodologies in enhancing SOC operational efficiency and response precision.












