TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks

Ezerceli, Özay; Gümüşçekiçci, Gizem; Erkoç, Tuğba; Özenç, Berke

TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks

dc.authorid	0000-0002-7877-7528
dc.authorid	0000-0002-9502-7817
dc.authorid	0000-0001-9033-8934
dc.authorid	0000-0003-2008-243X
dc.contributor.author	Ezerceli, Özay	en_US
dc.contributor.author	Gümüşçekiçci, Gizem	en_US
dc.contributor.author	Erkoç, Tuğba	en_US
dc.contributor.author	Özenç, Berke	en_US
dc.date.accessioned	2026-05-06T07:02:07Z
dc.date.available	2026-05-06T07:02:07Z
dc.date.issued	2025-11-10
dc.department	Işık Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.department	Işık University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering	en_US
dc.description.abstract	This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resourceconstrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-bTR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.	en_US
dc.identifier.citation	zerceli, Ö., Gümüşçekiçci, G., Erkoç, T. & Özenç, B. (2025). TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks. Arxiv, 1-9. doi: https://doi.org/10.48550/arXiv.2511.08376	en_US
dc.identifier.endpage	9
dc.identifier.startpage	1
dc.identifier.uri	https://hdl.handle.net/11729/7379
dc.identifier.uri	https://doi.org/10.48550/arXiv.2511.08376
dc.identifier.wos	PPRN:161696255
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Preprint Citation Index	en_US
dc.institutionauthor	Gümüşçekiçci, Gizem	en_US
dc.institutionauthor	Erkoç, Tuğba	en_US
dc.institutionauthor	Özenç, Berke	en_US
dc.institutionauthorid	0000-0002-9502-7817
dc.institutionauthorid	0000-0001-9033-8934
dc.institutionauthorid	0000-0003-2008-243X
dc.language.iso	en	en_US
dc.publisher	Cornell Univ	en_US
dc.relation.ispartof	Arxiv	en_US
dc.relation.publicationcategory	Ön Baskı – Uluslararası – Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Semantic text similarity	en_US
dc.subject	Matryoshka representation	en_US
dc.subject	Embedding model	en_US
dc.subject	Natural language inference	en_US
dc.subject	Downstream task	en_US
dc.title	TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks	en_US
dc.type	Preprint	en_US
dspace.entity.type	Publication	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: TurkEmbed_Turkish_Embedding_Model_on_NLI_STS_Tasks.pdf
Boyut:: 1.17 MB
Biçim:: Adobe Portable Document Format

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.17 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Bildiri Koleksiyonu | Bilgisayar Mühendisliği Bölümü