TURSpider: a Turkish Text-to-SQL dataset and LLM-based study

dc.authorid0000-0003-9031-1485
dc.authorid0000-0002-8649-6013
dc.contributor.authorKanburoğlu, Ali Buğraen_US
dc.contributor.authorTek, Faik Borayen_US
dc.date.accessioned2025-08-21T12:17:15Z
dc.date.available2025-08-21T12:17:15Z
dc.date.issued2024-11-25
dc.departmentIşık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programıen_US
dc.departmentIşık University, School of Graduate Studies, Ph.D. in Computer Engineeringen_US
dc.descriptionThe authors would like to sincerely thank Giray Yildirim, Aydan G\u00FCnaydin, and Metin Soyalp, students at the Department of Computer Engineering, ?Istanbul Technical University, for their outstanding efforts in translating the original Spider dataset into Turkish.en_US
dc.description.abstractThis paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.en_US
dc.description.sponsorshipIstanbul Teknik Üniversitesien_US
dc.description.versionPublisher's Versionen_US
dc.identifier.citationKanburoğlu, A. B. & Tek, F. B. (2024). TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study. IEEE Access, 12, 169379-169387. doi:10.1109/ACCESS.2024.3498841en_US
dc.identifier.doi10.1109/ACCESS.2024.3498841
dc.identifier.endpage169387
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-85209762610
dc.identifier.scopusqualityQ1
dc.identifier.startpage169379
dc.identifier.urihttps://hdl.handle.net/11729/6639
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2024.3498841
dc.identifier.volume12
dc.identifier.wosWOS:001362119600015
dc.identifier.wosqualityQ2
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScience Citation Index Expanded (SCI-EXPANDED)en_US
dc.institutionauthorKanburoğlu, Ali Buğraen_US
dc.institutionauthorid0000-0003-9031-1485
dc.language.isoenen_US
dc.peerreviewedYesen_US
dc.publicationstatusPublisheden_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartofIEEE Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Öğrencien_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectDataseten_US
dc.subjectLarge language modelsen_US
dc.subjectLLMen_US
dc.subjectText-to-SQLen_US
dc.subjectTurkishen_US
dc.subjectTURSpideren_US
dc.subjectQuery languagesen_US
dc.subjectLanguage modelen_US
dc.subjectModel-based OPCen_US
dc.subjectTurkish textsen_US
dc.subjectStructured Query Languageen_US
dc.titleTURSpider: a Turkish Text-to-SQL dataset and LLM-based studyen_US
dc.typeArticleen_US
dspace.entity.typePublicationen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
TURSpider_a_Turkish_Text_to_SQL_dataset_and_LLM_based_study.pdf
Boyut:
5.16 MB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: