TURSpider: a Turkish Text-to-SQL dataset and LLM-based study

Kanburoğlu, Ali Buğra; Tek, Faik Boray

TURSpider: a Turkish Text-to-SQL dataset and LLM-based study

Dosyalar

TURSpider_a_Turkish_Text_to_SQL_dataset_and_LLM_based_study.pdf (5.16 MB)

Tarih

2024-11-25

Yazarlar

Kanburoğlu, Ali Buğra

Tek, Faik Boray

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.

Açıklama

The authors would like to sincerely thank Giray Yildirim, Aydan G\u00FCnaydin, and Metin Soyalp, students at the Department of Computer Engineering, ?Istanbul Technical University, for their outstanding efforts in translating the original Spider dataset into Turkish.

Anahtar Kelimeler

Dataset, Large language models, LLM, Text-to-SQL, Turkish, TURSpider, Query languages, Language model, Model-based OPC, Turkish texts, Structured Query Language

Kaynak

IEEE Access

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

12

Künye

Kanburoğlu, A. B. & Tek, F. B. (2024). TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study. IEEE Access, 12, 169379-169387. doi:10.1109/ACCESS.2024.3498841

Bağlantı

https://hdl.handle.net/11729/6639
https://doi.org/10.1109/ACCESS.2024.3498841

Koleksiyon

Öğrenci Yayınları Makale Koleksiyonu
Lisansüstü Eğitim Enstitüsü Diğer Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

TURSpider: a Turkish Text-to-SQL dataset and LLM-based study

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon