Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish

Cesur, Neslihan; Kuzgun, Aslı; Köse, Mehmet; Yıldız, Olcay Taner

Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish

dc.authorid	0000-0002-3195-2747
dc.authorid	0000-0002-6333-5129
dc.authorid	0000-0001-5838-4615
dc.contributor.author	Cesur, Neslihan	en_US
dc.contributor.author	Kuzgun, Aslı	en_US
dc.contributor.author	Köse, Mehmet	en_US
dc.contributor.author	Yıldız, Olcay Taner	en_US
dc.date.accessioned	2024-07-26T10:17:28Z
dc.date.available	2024-07-26T10:17:28Z
dc.date.issued	2024-05-20
dc.department	Işık Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.department	Işık University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering	en_US
dc.description.abstract	In this paper, we introduce the annotation process of the Air Travel Information Systems (ATIS) Dataset as a parallel treebank in English and in Turkish. The ATIS Dataset was originally compiled as pilot data to measure the efficiency of Spoken Language Systems and it comprises human speech transcriptions of people asking for flight information on the automated inquiry systems. Our first annotated treebank, which is in English, includes 61.879 tokens (5.432 sentences) while the second treebank, which was translated into Turkish, contains 45.875 tokens for the same amount of sentences. First, both treebanks were morphologically annotated through a semi-automatic process. Later, the dependency annotations were performed by a team of linguists according to the Universal Dependencies (UD) guidelines. These two parallel annotated treebanks provide a valuable contribution to language resources thanks to the spontaneous/spoken nature of the data and the availability of cross-linguistic dependency annotation.	en_US
dc.identifier.citation	Cesur, N., Kuzgun, A., Köse, M. & Yıldız, O. T. (2024). Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish. Paper presented at the 17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings, 104-110.	en_US
dc.identifier.endpage	110
dc.identifier.scopus	2-s2.0-85198633728
dc.identifier.scopusquality	N/A
dc.identifier.startpage	104
dc.identifier.uri	https://hdl.handle.net/11729/6403
dc.indekslendigikaynak	Scopus
dc.institutionauthor	Köse, Mehmet	en_US
dc.language.iso	en	en_US
dc.publisher	European Language Resources Association (ELRA)	en_US
dc.relation.ispartof	17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Öğrenci	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Annotated corpus	en_US
dc.subject	ATIS	en_US
dc.subject	Parallel corpora	en_US
dc.subject	Universal dependencies	en_US
dc.subject	Linguistics	en_US
dc.subject	Speech transmission	en_US
dc.subject	Translation (languages)	en_US
dc.subject	Air travel information system	en_US
dc.subject	Air travels	en_US
dc.subject	Human speech	en_US
dc.subject	Spoken languages	en_US
dc.subject	Travel information system	en_US
dc.subject	Treebanks	en_US
dc.subject	Turkishs	en_US
dc.subject	Universal dependency	en_US
dc.subject	Transcription	en_US
dc.title	Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish	en_US
dc.type	Conference Object	en_US
dspace.entity.type	Publication

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Building_Annotated_Parallel_Corpora_Using_the_ATIS_Dataset_Two_UD_style_treebanks_in_English_and_Turkish.pdf
Boyut:: 779.28 KB
Biçim:: Adobe Portable Document Format

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.17 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Öğrenci Yayınları Koleksiyonu
Bilgisayar Mühendisliği Bölümü Koleksiyonu