Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish

dc.authorid0000-0002-3195-2747
dc.authorid0000-0002-6333-5129
dc.authorid0000-0001-5838-4615
dc.contributor.authorCesur, Neslihanen_US
dc.contributor.authorKuzgun, Aslıen_US
dc.contributor.authorKöse, Mehmeten_US
dc.contributor.authorYıldız, Olcay Taneren_US
dc.date.accessioned2024-07-26T10:17:28Z
dc.date.available2024-07-26T10:17:28Z
dc.date.issued2024-05-20
dc.departmentIşık Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.departmentIşık University, Faculty of Engineering and Natural Sciences, Department of Computer Engineeringen_US
dc.description.abstractIn this paper, we introduce the annotation process of the Air Travel Information Systems (ATIS) Dataset as a parallel treebank in English and in Turkish. The ATIS Dataset was originally compiled as pilot data to measure the efficiency of Spoken Language Systems and it comprises human speech transcriptions of people asking for flight information on the automated inquiry systems. Our first annotated treebank, which is in English, includes 61.879 tokens (5.432 sentences) while the second treebank, which was translated into Turkish, contains 45.875 tokens for the same amount of sentences. First, both treebanks were morphologically annotated through a semi-automatic process. Later, the dependency annotations were performed by a team of linguists according to the Universal Dependencies (UD) guidelines. These two parallel annotated treebanks provide a valuable contribution to language resources thanks to the spontaneous/spoken nature of the data and the availability of cross-linguistic dependency annotation.en_US
dc.identifier.citationCesur, N., Kuzgun, A., Köse, M. & Yıldız, O. T. (2024). Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish. Paper presented at the 17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings, 104-110.en_US
dc.identifier.endpage110
dc.identifier.scopus2-s2.0-85198633728
dc.identifier.scopusqualityN/A
dc.identifier.startpage104
dc.identifier.urihttps://hdl.handle.net/11729/6403
dc.indekslendigikaynakScopus
dc.institutionauthorKöse, Mehmeten_US
dc.language.isoenen_US
dc.publisherEuropean Language Resources Association (ELRA)en_US
dc.relation.ispartof17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedingsen_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Öğrencien_US
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectAnnotated corpusen_US
dc.subjectATISen_US
dc.subjectParallel corporaen_US
dc.subjectUniversal dependenciesen_US
dc.subjectLinguisticsen_US
dc.subjectSpeech transmissionen_US
dc.subjectTranslation (languages)en_US
dc.subjectAir travel information systemen_US
dc.subjectAir travelsen_US
dc.subjectHuman speechen_US
dc.subjectSpoken languagesen_US
dc.subjectTravel information systemen_US
dc.subjectTreebanksen_US
dc.subjectTurkishsen_US
dc.subjectUniversal dependencyen_US
dc.subjectTranscriptionen_US
dc.titleBuilding annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkishen_US
dc.typeConference Objecten_US
dspace.entity.typePublication

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Building_Annotated_Parallel_Corpora_Using_the_ATIS_Dataset_Two_UD_style_treebanks_in_English_and_Turkish.pdf
Boyut:
779.28 KB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: