Evaluating the English-Turkish parallel treebank for machine translation
Citation
Görgün, O. & Yıldız, O. T. (2022). Evaluating the English-Turkish parallel treebank for machine translation. Turkish Journal Of Electrical Engineering And Computer Sciences, 30(1), 184-199. doi:10.3906/elk-2102-57Abstract
This study extends our initial efforts in building an English-Turkish parallel treebank corpus for statistical machine translation tasks. We manually generated parallel trees for about 17K sentences selected from the Penn Treebank corpus. English sentences vary in length: 15 to 50 tokens including punctuation. We constrained the translation of trees by (i) reordering of leaf nodes based on suffixation rules in Turkish, and (ii) gloss replacement. We aim to mimic human annotator's behavior in real translation task. In order to fill the morphological and syntactic gap between languages, we do morphological annotation and disambiguation. We also apply our heuristics by creating Nokia English-Turkish Treebank (NTB) to address technical document translation tasks. NTB also includes 8.3K sentences in varying lengths. We validate the corpus both extrinsically and intrinsically, and report our evaluation results regarding perplexity analysis and translation task results. Results prove that our heuristics yield promising results in terms of perplexity and are suitable for translation tasks in terms of BLEU scores.
Source
Turkish Journal Of Electrical Engineering And Computer SciencesVolume
30Issue
1The following license files are associated with this item:
Related items
Showing items related by title, author, creator and subject.
-
Web service translating content into Turkish sign language
Gümüşçekiçci, Gizem; Ezerceli, Özay; Tek, Faik Boray (Institute of Electrical and Electronics Engineers Inc., 2020-10-12)The essential communication tool for people with hearing loss is sign language. It is way more efficient for their communication. Existing systems for translating the text into sign language are offline and not practical. ... -
İngi̇li̇zce-Türkçe i̇stati̇sti̇ksel maki̇ne çevi̇ri̇si̇nde bi̇çi̇m bi̇li̇m kullanımı
Görgün, Onur; Yıldız, Olcay Taner (IEEE, 2012-04-18)Bu çalışmada, İngilizce-Türkçe dil ikilisi için biçimbilimsel çözümleme yardımı ile SIU dermecesi üzerinde istatistiksel makine çevirisi denemeleri yapılmıştır. Kelime biçimlerinin baz alındığı çeviri denemeleri İngilizce-Türkçe ... -
English-Turkish parallel treebank with morphological annotations and its use in tree-based SMT
Görgün, Onur; Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh (SciTePress, 2016)In this paper, we report our tree based statistical translation study from English to Turkish. We describe our data generation process and report the initial results of tree-based translation under a simple model. For ...