Arama Sonuçları

Listeleniyor 1 - 4 / 4
  • Yayın
    TUR2SQL: A cross-domain Turkish dataset for Text-to-SQL
    (IEEE, 2023-09-15) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    The field of converting natural language into corresponding SQL queries using deep learning techniques has attracted significant attention in recent years. While existing Text-to-SQL datasets primarily focus on English and other languages such as Chinese, there is a lack of resources for the Turkish language. In this study, we introduce the first publicly available cross-domain Turkish Text-to-SQL dataset, named TUR2SQL. This dataset consists of 10,809 pairs of natural language statements and their corresponding SQL queries. We conducted experiments using SQLNet and ChatGPT on the TUR2SQL dataset. The experimental results show that SQLNet has limited performance and ChatGPT has superior performance on the dataset. We believe that TUR2SQL provides a foundation for further exploration and advancements in Turkish language-based Text-to-SQL research.
  • Yayın
    TURSpider: a Turkish Text-to-SQL dataset and LLM-based study
    (Institute of Electrical and Electronics Engineers Inc., 2024-11-25) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.
  • Yayın
    Text-to-SQL: a methodical review of challenges and models
    (TÜBİTAK, 2024-05-20) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    This survey focuses on Text-to-SQL, automated translation of natural language queries into SQL queries. Initially, we describe the problem and its main challenges. Then, by following the PRISMA systematic review methodology, we survey the existing Text-to-SQL review papers in the literature. We apply the same method to extract proposed Text-to-SQL models and classify them with respect to used evaluation metrics and benchmarks. We highlight the accuracies achieved by various models on Text-to-SQL datasets and discuss execution-guided evaluation strategies. We present insights into model training times and implementations of different models. We also explore the availability of Text-to-SQL datasets in non-English languages. Additionally, we focus on large language model (LLM) based approaches for the Text-to-SQL task, where we examine LLM-based studies in the literature and subsequently evaluate the LLMs on the cross-domain Spider dataset. Finally, we conclude with a discussion of future directions for Text-to-SQL research, identifying potential areas of improvement and advancements in this field.
  • Yayın
    TURSpider veri kümesinde Temsilcilerin Karışımı Tabanlı Text-to-SQL çalışması
    (IEEE, 2025) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    Bu çalışma, Türkçe Text-to-SQL için geliştirilen TURSpider veri kümesi üzerindeki deneyleri ele almaktadır. TURSpider, çeşitli zorluk seviyelerine sahip SQL sorgularını içeren geniş kapsamlı bir Türkçe veri kümesidir ve bu alandaki araştırmalar için önemli bir kaynak niteliğindedir. Çalışmada, geri bildirim odaklı temsilcilerin karışımı yaklaşımının (ing. feedback driven Mixture-of-Agents - MoAF) başarımı incelenmiştir. MoAF yapısında, birden fazla büyük dil modeli (BDM) iş birligi içinde çalışarak SQL oluşturma başarımını artırmayı hedeflemektedir. Bu yapıda temsilci (ing. agent) işbirliği, modellerin birbirinden ögrenmesini ve geri bildirim mekanizmaları aracılığıyla hataların düzeltilmesini sağlamaktadır. Deney sonuçlarına göre, MoAF yaklaşımı ile %60.63 yürütme doğruluğuna ulaşılmış ve TURSpider veri kümesi üzerindeki en iyi sonuç elde edilmiştir.