Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation

Dalva, Doğan; Güz, Ümit; Gürkan, Hakan

dc.contributor.author	Dalva, Doğan	en_US
dc.contributor.author	Güz, Ümit	en_US
dc.contributor.author	Gürkan, Hakan	en_US
dc.date.accessioned	2019-05-22T00:14:38Z
dc.date.available	2019-05-22T00:14:38Z
dc.date.issued	2018
dc.identifier.citation	Dalva, D., Güz, Ü. & Gürkan, H. (2018). Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation. Paper presented at the 2018 IEEE Spoken Language Technology Workshop (SLT), 750-755. doi:10.1109/SLT.2018.8639533	en_US
dc.identifier.isbn	9781538643341
dc.identifier.isbn	9781538643334
dc.identifier.isbn	9781538643358
dc.identifier.issn	2639-5479
dc.identifier.uri	https://hdl.handle.net/11729/1594
dc.identifier.uri	http://dx.doi.org/10.1109/SLT.2018.8639533
dc.description.abstract	The objective of this work is to develop effective multi-view semi-supervised machine learning strategies for sentence boundary classification problem when only small sets of sentence boundary labeled data are available. We propose three-view and committee-based learning strategies incorporating with co-training algorithms with agreement, disagreement, and self-combined learning strategies using prosodic, lexical and morphological information. We compare experimental results of proposed three-view and committee-based learning strategies to other semi-supervised learning strategies in the literature namely, self-training and co-training with agreement, disagreement, and self-combined strategies. The experiment results show that sentence segmentation performance can be highly improved using multi-view learning strategies that we propose since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average performance when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.	en_US
dc.description.sponsorship	This material is based upon work supported by the Scientific and Technological Research Council of Turkey (TUBITAK) (Project Number: 107E182 and Project Number: 111E228) and Isik University Scientific Research Project Fund (Project Number: 09A301 and Project Number: 14A201) and J. William Fulbright Post-Doctoral Research Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	10.1109/SLT.2018.8639533
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Boosting	en_US
dc.subject	Co-training	en_US
dc.subject	Sentence segmentation	en_US
dc.subject	Semi-supervised learning	en_US
dc.subject	Prosody	en_US
dc.subject	Speech	en_US
dc.subject	Learning algorithms	en_US
dc.subject	Machine learning	en_US
dc.subject	Supervised learning	en_US
dc.subject	Data models	en_US
dc.subject	Semisupervised learning	en_US
dc.subject	Feature extraction	en_US
dc.subject	Training	en_US
dc.subject	Tools	en_US
dc.subject	Task analysis	en_US
dc.subject	Learning (artificial intelligence)	en_US
dc.subject	Natural language processing	en_US
dc.subject	Speech processing	en_US
dc.subject	Multiview learning strategies	en_US
dc.subject	Disjoint feature sets	en_US
dc.subject	Manually labeled data	en_US
dc.subject	Sentence boundary classification problem	en_US
dc.subject	Sentence boundary labeled data	en_US
dc.subject	Committee-based learning strategies	en_US
dc.subject	Prosodic information	en_US
dc.subject	Lexical information	en_US
dc.subject	Morphological information	en_US
dc.subject	Self-combined strategies	en_US
dc.subject	Automatic sentence segmentation	en_US
dc.subject	Conventional co-training learning	en_US
dc.subject	Multiview semisupervised machine learning	en_US
dc.subject	Turkish spoken languages	en_US
dc.subject	English spoken languages	en_US
dc.title	Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation	en_US
dc.type	conferenceObject	en_US
dc.description.version	Publisher's Version	en_US
dc.relation.journal	2018 IEEE Spoken Language Technology Workshop (SLT)	en_US
dc.contributor.department	Işık Üniversitesi, Mühendislik Fakültesi, Elektrik-Elektronik Mühendisliği Bölümü	en_US
dc.contributor.department	Işık University, Faculty of Engineering, Department of Electrical-Electronics Engineering	en_US
dc.contributor.authorID	0000-0002-4597-0954
dc.identifier.startpage	750
dc.identifier.endpage	755
dc.peerreviewed	Yes	en_US
dc.publicationstatus	Published	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.contributor.institutionauthor	Dalva, Doğan	en_US
dc.contributor.institutionauthor	Güz, Ümit	en_US
dc.relation.index	WOS	en_US
dc.relation.index	Scopus	en_US
dc.relation.index	Conference Proceedings Citation Index – Science (CPCI-S)	en_US
dc.description.wosid	WOS:000463141800104

Bu öğenin dosyaları:

Ad:: 1594.pdf
Boyut:: 211.4Kb
Biçim:: PDF
Açıklama:: Publisher's Version

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

MF - Bildiri Koleksiyonu | Elektrik-Elektronik Mühendisliği Bölümü / Department of Electrical-Electronics Engineering [222]
Elektrik-Elektronik Mühendisliği Bölümüne ait bildiri koleksiyonunu içerir.
Scopus İndeksli Bildiri Koleksiyonu [484]
WOS İndeksli Bildiri Koleksiyonu [405]

Basit öğe kaydını göster

Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation

Bu öğenin dosyaları:

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

İlgili Öğeler

Effective semi-supervised learning strategies for automatic sentence segmentation ﻿

Co-training using prosodic, lexical and morphological information for automatic sentence segmentation of Turkish spoken language ﻿

Shallow parsing in Turkish ﻿

Effective semi-supervised learning strategies for automatic sentence segmentation

Co-training using prosodic, lexical and morphological information for automatic sentence segmentation of Turkish spoken language

Shallow parsing in Turkish