Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish
dc.authorid | 0000-0002-3195-2747 | |
dc.authorid | 0000-0002-6333-5129 | |
dc.authorid | 0000-0001-5838-4615 | |
dc.contributor.author | Cesur, Neslihan | en_US |
dc.contributor.author | Kuzgun, Aslı | en_US |
dc.contributor.author | Köse, Mehmet | en_US |
dc.contributor.author | Yıldız, Olcay Taner | en_US |
dc.date.accessioned | 2024-07-26T10:17:28Z | |
dc.date.available | 2024-07-26T10:17:28Z | |
dc.date.issued | 2024-05-20 | |
dc.department | Işık Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü | en_US |
dc.department | Işık University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering | en_US |
dc.description.abstract | In this paper, we introduce the annotation process of the Air Travel Information Systems (ATIS) Dataset as a parallel treebank in English and in Turkish. The ATIS Dataset was originally compiled as pilot data to measure the efficiency of Spoken Language Systems and it comprises human speech transcriptions of people asking for flight information on the automated inquiry systems. Our first annotated treebank, which is in English, includes 61.879 tokens (5.432 sentences) while the second treebank, which was translated into Turkish, contains 45.875 tokens for the same amount of sentences. First, both treebanks were morphologically annotated through a semi-automatic process. Later, the dependency annotations were performed by a team of linguists according to the Universal Dependencies (UD) guidelines. These two parallel annotated treebanks provide a valuable contribution to language resources thanks to the spontaneous/spoken nature of the data and the availability of cross-linguistic dependency annotation. | en_US |
dc.identifier.citation | Cesur, N., Kuzgun, A., Köse, M. & Yıldız, O. T. (2024). Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish. Paper presented at the 17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings, 104-110. | en_US |
dc.identifier.endpage | 110 | |
dc.identifier.scopus | 2-s2.0-85198633728 | |
dc.identifier.scopusquality | N/A | |
dc.identifier.startpage | 104 | |
dc.identifier.uri | https://hdl.handle.net/11729/6403 | |
dc.indekslendigikaynak | Scopus | |
dc.institutionauthor | Köse, Mehmet | en_US |
dc.language.iso | en | en_US |
dc.publisher | European Language Resources Association (ELRA) | en_US |
dc.relation.ispartof | 17th Workshop on Building and Using Comparable Corpora, BUCC 2024 at LREC-COLING 2024 - Proceedings | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Öğrenci | en_US |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | Annotated corpus | en_US |
dc.subject | ATIS | en_US |
dc.subject | Parallel corpora | en_US |
dc.subject | Universal dependencies | en_US |
dc.subject | Linguistics | en_US |
dc.subject | Speech transmission | en_US |
dc.subject | Translation (languages) | en_US |
dc.subject | Air travel information system | en_US |
dc.subject | Air travels | en_US |
dc.subject | Human speech | en_US |
dc.subject | Spoken languages | en_US |
dc.subject | Travel information system | en_US |
dc.subject | Treebanks | en_US |
dc.subject | Turkishs | en_US |
dc.subject | Universal dependency | en_US |
dc.subject | Transcription | en_US |
dc.title | Building annotated parallel corpora using the ATIS Dataset: two UD-style treebanks in English and Turkish | en_US |
dc.type | Conference Object | en_US |
dspace.entity.type | Publication |
Dosyalar
Orijinal paket
1 - 1 / 1
Yükleniyor...
- İsim:
- Building_Annotated_Parallel_Corpora_Using_the_ATIS_Dataset_Two_UD_style_treebanks_in_English_and_Turkish.pdf
- Boyut:
- 779.28 KB
- Biçim:
- Adobe Portable Document Format
Lisans paketi
1 - 1 / 1
Küçük Resim Yok
- İsim:
- license.txt
- Boyut:
- 1.17 KB
- Biçim:
- Item-specific license agreed upon to submission
- Açıklama: