5 sonuçlar
Arama Sonuçları
Listeleniyor 1 - 5 / 5
Yayın Integrating Turkish Wordnet KeNet to Princeton WordNet: The case of one-to-many correspondences(Institute of Electrical and Electronics Engineers Inc., 2019-10) Bakay, Özge; Ergelen, Özlem; Yıldız, Olcay TanerIn this paper, we introduce a novel approach of forming interlingual relations between multilingual wordnets. We have mapped Turkish senses in KeNet with their corresponding senses in Princeton WordNet by drawing one-To-many correspondences. As a result of language-specific properties, one synset in one language is matched with multiple synsets in the other language in some cases. Our method of integrating KeNet into a multilingual network also included mapping the most frequent 5000 senses in English with their equivalent senses in Turkish. What we demonstrate is that one-To-many interlingual correspondances are necessary to include in mappings both from Turkish-To-English and English-To-Turkish. Furthermore, one-To-many mappings give us insights into the semantic relations to be constructed in Turkish, such as hypernymy.Yayın Constructing a WordNet for Turkish using manual and automatic annotation(Assoc Computing Machinery, 2018-05) Ehsani, Razieh; Solak, Ercan; Yıldız, Olcay TanerIn this article, we summarize the methodology and the results of our 2-year-long efforts to construct a comprehensive WordNet for Turkish. In our approach, we mine a dictionary for synonym candidate pairs and manually mark the senses in which the candidates are synonymous. We marked every pair twice by different human annotators. We derive the synsets by finding the connected components of the graph whose edges are synonym senses. We also mined Turkish Wikipedia for hypernym relations among the senses. We analyzed the resulting WordNet to highlight the difficulties brought about by the dictionary construction methods of lexicographers. After splitting the unusually large synsets, we used random walk-based clustering that resulted in a Zipfian distribution of synset sizes. We compared our results to BalkaNet and automatic thesaurus construction methods using variation of information metric. Our Turkish WordNet is available online.Yayın Shallow parsing in Turkish(IEEE, 2017) Topsakal, Ozan; Açıkgöz, Onur; Gürkan, Ali Tunca; Kanburoğlu, Ali Buğra; Ertopçu, Burak; Özenç, Berke; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay TanerIn this study, shallow parsing is applied on Turkish sentences. These sentences are used to train and test the per-formances of various learning algorithms with various features specified for shallow parsing in Turkish.Yayın Evaluating the English-Turkish parallel treebank for machine translation(TÜBİTAK, 2022-01-19) Görgün, Onur; Yıldız, Olcay TanerThis study extends our initial efforts in building an English-Turkish parallel treebank corpus for statistical machine translation tasks. We manually generated parallel trees for about 17K sentences selected from the Penn Treebank corpus. English sentences vary in length: 15 to 50 tokens including punctuation. We constrained the translation of trees by (i) reordering of leaf nodes based on suffixation rules in Turkish, and (ii) gloss replacement. We aim to mimic human annotator's behavior in real translation task. In order to fill the morphological and syntactic gap between languages, we do morphological annotation and disambiguation. We also apply our heuristics by creating Nokia English-Turkish Treebank (NTB) to address technical document translation tasks. NTB also includes 8.3K sentences in varying lengths. We validate the corpus both extrinsically and intrinsically, and report our evaluation results regarding perplexity analysis and translation task results. Results prove that our heuristics yield promising results in terms of perplexity and are suitable for translation tasks in terms of BLEU scores.Yayın A multilayer annotated corpus for Turkish(IEEE, 2018-06-06) Yıldız, Olcay Taner; Ak, Koray; Ercan, Gökhan; Topsakal, Ozan; Asmazoğlu, CengizIn this paper, we present the first multilayer annotated corpus for Turkish, which is a low-resourced agglutinative language. Our dataset consists of 9,600 sentences translated from the Penn Treebank Corpus. Annotated layers contain syntactic and semantic information including morphological disambiguation of words, named entity annotation, shallow parse, sense annotation, and semantic role label annotation.












