TURSpider: a Turkish Text-to-SQL dataset and LLM-based study
dc.authorid | 0000-0003-9031-1485 | |
dc.authorid | 0000-0002-8649-6013 | |
dc.contributor.author | Kanburoğlu, Ali Buğra | en_US |
dc.contributor.author | Tek, Faik Boray | en_US |
dc.date.accessioned | 2025-08-21T12:17:15Z | |
dc.date.available | 2025-08-21T12:17:15Z | |
dc.date.issued | 2024-11-25 | |
dc.department | Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programı | en_US |
dc.department | Işık University, School of Graduate Studies, Ph.D. in Computer Engineering | en_US |
dc.description | The authors would like to sincerely thank Giray Yildirim, Aydan G\u00FCnaydin, and Metin Soyalp, students at the Department of Computer Engineering, ?Istanbul Technical University, for their outstanding efforts in translating the original Spider dataset into Turkish. | en_US |
dc.description.abstract | This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages. | en_US |
dc.description.sponsorship | Istanbul Teknik Üniversitesi | en_US |
dc.description.version | Publisher's Version | en_US |
dc.identifier.citation | Kanburoğlu, A. B. & Tek, F. B. (2024). TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study. IEEE Access, 12, 169379-169387. doi:10.1109/ACCESS.2024.3498841 | en_US |
dc.identifier.doi | 10.1109/ACCESS.2024.3498841 | |
dc.identifier.endpage | 169387 | |
dc.identifier.issn | 2169-3536 | |
dc.identifier.scopus | 2-s2.0-85209762610 | |
dc.identifier.scopusquality | Q1 | |
dc.identifier.startpage | 169379 | |
dc.identifier.uri | https://hdl.handle.net/11729/6639 | |
dc.identifier.uri | https://doi.org/10.1109/ACCESS.2024.3498841 | |
dc.identifier.volume | 12 | |
dc.identifier.wos | WOS:001362119600015 | |
dc.identifier.wosquality | Q2 | |
dc.indekslendigikaynak | Scopus | en_US |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Science Citation Index Expanded (SCI-EXPANDED) | en_US |
dc.institutionauthor | Kanburoğlu, Ali Buğra | en_US |
dc.institutionauthorid | 0000-0003-9031-1485 | |
dc.language.iso | en | en_US |
dc.peerreviewed | Yes | en_US |
dc.publicationstatus | Published | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
dc.relation.ispartof | IEEE Access | en_US |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Öğrenci | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Dataset | en_US |
dc.subject | Large language models | en_US |
dc.subject | LLM | en_US |
dc.subject | Text-to-SQL | en_US |
dc.subject | Turkish | en_US |
dc.subject | TURSpider | en_US |
dc.subject | Query languages | en_US |
dc.subject | Language model | en_US |
dc.subject | Model-based OPC | en_US |
dc.subject | Turkish texts | en_US |
dc.subject | Structured Query Language | en_US |
dc.title | TURSpider: a Turkish Text-to-SQL dataset and LLM-based study | en_US |
dc.type | Article | en_US |
dspace.entity.type | Publication | en_US |