Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish

Yükleniyor...
Küçük Resim

Tarih

2024-05-20

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

European Language Resources Association (ELRA)

Erişim Hakkı

info:eu-repo/semantics/openAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

In this paper, we introduce the annotation process of the Air Travel Information Systems (ATIS) Dataset as a parallel treebank in English and in Turkish. The ATIS Dataset was originally compiled as pilot data to measure the efficiency of Spoken Language Systems and it comprises human speech transcriptions of people asking for flight information on the automated inquiry systems. Our first annotated treebank, which is in English, includes 61.879 tokens (5.432 sentences) while the second treebank, which was translated into Turkish, contains 45.875 tokens for the same amount of sentences. First, both treebanks were morphologically annotated through a semi-automatic process. Later, the dependency annotations were performed by a team of linguists according to the Universal Dependencies (UD) guidelines. These two parallel annotated treebanks provide a valuable contribution to language resources thanks to the spontaneous/spoken nature of the data and the availability of cross-linguistic dependency annotation.

Açıklama

Anahtar Kelimeler

Annotated corpus, ATIS, Parallel corpora, Universal dependencies, Linguistics, Speech transmission, Translation (languages), Air travel information system, Air travels, Human speech, Spoken languages, Travel information system, Treebanks, Turkishs, Universal dependency, Transcription

Kaynak

17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings

WoS Q Değeri

Scopus Q Değeri

N/A

Cilt

Sayı

Künye

Cesur, N., Kuzgun, A., Köse, M. & Yıldız, O. T. (2024). Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish. Paper presented at the 17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings, 104-110.