Türkçe dil işleme için bürünsel bilginin çıkarılması ve kullanılması

Güz, Ümit; Gürkan, Hakan; Yiğit, Sinan

Türkçe dil işleme için bürünsel bilginin çıkarılması ve kullanılması

dc.authorid	0000-0002-4597-0954
dc.authorid	0000-0002-7008-4778
dc.contributor.author	Güz, Ümit	en_US
dc.contributor.author	Gürkan, Hakan	en_US
dc.contributor.author	Yiğit, Sinan	en_US
dc.date.accessioned	2023-03-13T12:53:45Z
dc.date.available	2023-03-13T12:53:45Z
dc.date.issued	2010-02-01
dc.department	Işık Üniversitesi, Mühendislik Fakültesi, Elektrik-Elektronik Mühendisliği Bölümü	en_US
dc.department	Işık University, Faculty of Engineering, Department of Electrical-Electronics Engineering	en_US
dc.description.abstract	Bu projede genel olarak, konuşulan dili (Türkçe) anlamada, konuşulan dilin bürünsel/ezgisel (prosodic) ve sözcüksel (lexical) özelliklerinin ortaya çıkarılması ve bu özelliklerin konuşulan dilin bilgisayarla otomatik olarak işlenmesinde kullanılması amaçlanmaktadır. Bu daha özel olarak, otomatik konuşma tanıyıcısının (ASR) çıkışına ilişkin cümle bölütleme işlevini içermektedir. Otomatik konuşma tanıma sistemlerinden çıkan yazılı metnin özellikle noktalama (punctuation), büyük küçük harf farklılıkları ve vurgu, tonlama, perde, durak gibi konuşmaya ilişkin temel bazı parametrelerden yoksun olması veya bu özellikleri kaybetmiş olması, özellikle anlamda farklılıklara yol açmaktadır. Bu çıktının zenginleştirilmesi (enrichment) başka bir deyiş ile bu özelliklerin tekrar geriye kazandırılması, bu metinlerin hem insanlar tarafından okunmasını ve doğru algılanmasını hem de makineler tarafından işlenmesini kolaylaştıracaktır. Bu projedeki amaç, bu zenginleştirme ve geri kazandırım işleminin dilin bürünsel özelliklerinden yararlanarak yapılmasıdır.	en_US
dc.description.abstract	The text which the output of the Automatic Speech Recognition (ASR) system lacks especially punctuation, differences in the capitalization and the parameters related to the speaking such as stress, tone, pitch, pause cause some differences in the meaning. Enrichment of this output or another words to gain this features back to the output will provide either reading and understanding of the humans or processing of the machines easily. The aim of this project is doing this enrichment and the process of gaining back by using the prosodic features of the spoken language. In this proposal, we would like to examine the extraction and use of prosodic information in addition to lexical features for spoken language processing of Turkish. Specifically, we would like to research the use of prosodic features for sentence segmentation of Turkish speech. Another outcome of the project is to obtain a database of prosodic features at the word and morpheme level, which can be used for other purposes such as morphological disambiguation or word sense disambiguation. Turkish is an agglutinative language. Thus, the text should be analyzed morphologically in order to determine the root forms and the suffixes of the words before further analysis. In the framework of this project, we also would like to examine the interaction of prosodic features with morphological information. The role of sentence segmentation is to detect sentence boundaries in the stream of words provided by the ASR module for further downstream processing. This is helpful for various language processing tasks, such as parsing, machine translation and question answering. We formulate sentence segmentation as a binary classification task. For each position between two consecutive words the system must decide if the position marks a boundary between two sentences or if the two neighboring words belong to the same sentence. The sentence segmentation process is established by combining the Hidden Event Language Models (HELMs) with discriminative classification methods. The HELM takes into account the sequence of words and the output discriminative classification methods such as decision tree that is based on prosodic features such as pause durations. The new approach combines the HELMs for exploiting lexical information, with maximum entropy and boosting classifiers that tightly integrate lexical, as well as prosodic, speaker change and syntactic features. The boostingbased classifier alone performs better than all the other classification schemes. When combined with a hidden event language model the improvement is even more pronounced.	en_US
dc.description.version	Publisher's Version	en_US
dc.identifier.citation	Güz, Ü., Gürkan, H. & Yiğit, S. (2010). Türkçe dil işleme için bürünsel bilginin çıkarılması ve kullanılması. Tübitak, 1-80.	en_US
dc.identifier.endpage	80
dc.identifier.startpage	1
dc.identifier.uri	https://hdl.handle.net/11729/5439
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/609737
dc.indekslendigikaynak	TR-Dizin	en_US
dc.institutionauthor	Güz, Ümit	en_US
dc.institutionauthor	Gürkan, Hakan	en_US
dc.institutionauthor	Yiğit, Sinan	en_US
dc.institutionauthorid	0000-0002-4597-0954
dc.institutionauthorid	0000-0002-7008-4778
dc.language.iso	tr	en_US
dc.peerreviewed	Yes	en_US
dc.publicationstatus	Published	en_US
dc.publisher	Tübitak	en_US
dc.relation.ispartof	Tübitak	en_US
dc.relation.publicationcategory	Diğer	en_US
dc.relation.tubitak	"info:eu-repo/grantAgreement/TUBITAK/EEEAG/107E182"
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Bürünsel bilgi	en_US
dc.subject	Dil işleme	en_US
dc.subject	Cümle bölütleme	en_US
dc.subject	Konu bölütleme	en_US
dc.subject	Prosodic information	en_US
dc.subject	Spoken language processing	en_US
dc.subject	Sentence segmentation	en_US
dc.subject	Topic segmentation	en_US
dc.title	Türkçe dil işleme için bürünsel bilginin çıkarılması ve kullanılması	en_US
dc.title.alternative	Extracting and using prosodic information for Turkish spoken language processing	en_US
dc.type	Project	en_US
dspace.entity.type	Project

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Turkce_dil_isleme_icin_burunsel_bilginin_cikarilmasi_ve_kullanilmasi.pdf
Boyut:: 1.6 MB
Biçim:: Adobe Portable Document Format
Açıklama:: Publisher's Version

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.44 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Diğer Koleksiyonu | Elektrik-Elektronik Mühendisliği Bölümü
Projeler
TR-Dizin İndeksli Yayınlar Koleksiyonu