TURSpider: a Turkish Text-to-SQL dataset and LLM-based study

Küçük Resim Yok

Tarih

2024-11-25

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.

Açıklama

The authors would like to sincerely thank Giray Yildirim, Aydan G\u00FCnaydin, and Metin Soyalp, students at the Department of Computer Engineering, ?Istanbul Technical University, for their outstanding efforts in translating the original Spider dataset into Turkish.

Anahtar Kelimeler

Dataset, Large language models, LLM, Text-to-SQL, Turkish, TURSpider, Query languages, Language model, Model-based OPC, Turkish texts, Structured Query Language

Kaynak

IEEE Access

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

12

Sayı

Künye

Kanburoğlu, A. B. & Tek, F. B. (2024). TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study. IEEE Access, 12, 169379-169387. doi:10.1109/ACCESS.2024.3498841