Işık Üniversitesi Kurumsal Akademik Belleği :: DSpace Angular

Listeleniyor 1 - 3 / 3

TURSpider: a Turkish Text-to-SQL dataset and LLM-based study
(Institute of Electrical and Electronics Engineers Inc., 2024-11-25) Kanburoğlu, Ali Buğra; Tek, Faik Boray
This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.
Large language model based automated translation of natural language to SQL
(Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, 2025-01-22) Kanburoğlu, Ali Buğra; Tek, Faik Boray; Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programı; Işık University, School of Graduate Studies, Ph.D. in Computer Engineering
The field of Text-to-SQL, which involves converting natural language into SQL queries, has seen significant advancements, but challenges remain, particularly for low-resource languages like Turkish. This thesis introduces three key contributions to address these challenges. Our first contribution is the development and open-access release of TUR2SQL, the first cross-domain Turkish Text-to-SQL dataset, which consists of 10,809 natural language sentences paired with their corresponding SQL queries. We evaluate the performance of SQLNet, a deep learning model specifically designed for this task, and one of the most successful Large Language Models (LLMs), ChatGPT, on this dataset. The results demonstrate the superior performance of ChatGPT. The second major contribution is the construction and publicly available release of TURSpider, the most extensive Turkish Text-to-SQL dataset. TURSpider is built by translating the widely used cross-domain Spider dataset from English to Turkish. This dataset includes complex queries with varying difficulty levels, facilitating the training and comparison of large language models for Turkish Text-to-SQL tasks. Our comparative analysis shows that fine-tuned Turkish LLMs achieve competitive performance, with some models surpassing OpenAI models in query accuracy. To further enhance performance, we apply the Chainof-Feedback (CoF) methodology, demonstrating its effectiveness across multiple models. Finally, we explore the Mixture-of-Agents (MoA) framework, which combines outputs from multiple models to improve the performance of open-source LLMs for Text-to-SQL tasks. By integrating MoA with the CoF technique, we propose MoAF-SQL, an approach that significantly improves performance, particularly on complex queries. Our experiments show that MoAF-SQL achieves competitive results, highlighting its potential to enhance the Text-to-SQL capabilities of open-source LLMs.
TURSpider veri kümesinde Temsilcilerin Karışımı Tabanlı Text-to-SQL çalışması
(IEEE, 2025) Kanburoğlu, Ali Buğra; Tek, Faik Boray
Bu çalışma, Türkçe Text-to-SQL için geliştirilen TURSpider veri kümesi üzerindeki deneyleri ele almaktadır. TURSpider, çeşitli zorluk seviyelerine sahip SQL sorgularını içeren geniş kapsamlı bir Türkçe veri kümesidir ve bu alandaki araştırmalar için önemli bir kaynak niteliğindedir. Çalışmada, geri bildirim odaklı temsilcilerin karışımı yaklaşımının (ing. feedback driven Mixture-of-Agents - MoAF) başarımı incelenmiştir. MoAF yapısında, birden fazla büyük dil modeli (BDM) iş birligi içinde çalışarak SQL oluşturma başarımını artırmayı hedeflemektedir. Bu yapıda temsilci (ing. agent) işbirliği, modellerin birbirinden ögrenmesini ve geri bildirim mekanizmaları aracılığıyla hataların düzeltilmesini sağlamaktadır. Deney sonuçlarına göre, MoAF yaklaşımı ile %60.63 yürütme doğruluğuna ulaşılmış ve TURSpider veri kümesi üzerindeki en iyi sonuç elde edilmiştir.

Filtreler

Yazar

Konu

Tarih

İndeks

WoS Q

Scopus Q

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Öğe Türü

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları