Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish
Yükleniyor...
Tarih
2024-05-20
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
European Language Resources Association (ELRA)
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
In this paper, we introduce the annotation process of the Air Travel Information Systems (ATIS) Dataset as a parallel treebank in English and in Turkish. The ATIS Dataset was originally compiled as pilot data to measure the efficiency of Spoken Language Systems and it comprises human speech transcriptions of people asking for flight information on the automated inquiry systems. Our first annotated treebank, which is in English, includes 61.879 tokens (5.432 sentences) while the second treebank, which was translated into Turkish, contains 45.875 tokens for the same amount of sentences. First, both treebanks were morphologically annotated through a semi-automatic process. Later, the dependency annotations were performed by a team of linguists according to the Universal Dependencies (UD) guidelines. These two parallel annotated treebanks provide a valuable contribution to language resources thanks to the spontaneous/spoken nature of the data and the availability of cross-linguistic dependency annotation.
Açıklama
Anahtar Kelimeler
Annotated corpus, ATIS, Parallel corpora, Universal dependencies, Linguistics, Speech transmission, Translation (languages), Air travel information system, Air travels, Human speech, Spoken languages, Travel information system, Treebanks, Turkishs, Universal dependency, Transcription
Kaynak
17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings
WoS Q Değeri
Scopus Q Değeri
N/A
Cilt
Sayı
Künye
Cesur, N., Kuzgun, A., Köse, M. & Yıldız, O. T. (2024). Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish. Paper presented at the 17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings, 104-110.