Basit öğe kaydını göster

dc.contributor.authorDalva, Doğanen_US
dc.contributor.authorGüz, Ümiten_US
dc.contributor.authorGürkan, Hakanen_US
dc.date.accessioned2018-12-13T01:04:03Z
dc.date.available2018-12-13T01:04:03Z
dc.date.issued2018-04-01
dc.identifier.citationDalva, D., Güz, Ü. & Gürkan, H. (2018). Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recognition Letters, 105(SI), 76-86. doi:10.1016/j.patrec.2017.10.010en_US
dc.identifier.issn0167-8655
dc.identifier.issn1872-7344
dc.identifier.urihttps://hdl.handle.net/11729/1416
dc.identifier.urihttp://dx.doi.org/10.1016/j.patrec.2017.10.010
dc.description.abstractThe primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.en_US
dc.description.sponsorshipThis material is based upon work supported by the Scientific and Technological Research Council of Turkey (TUBITAK) (Project Number: 107E182 and Project Number: 111E228), Isik University Scientific Research Projects Fund (Project Number: 09A301 and Project Number: 14A201), TUBITAK BIDEB and J. William Fulbright Post-Doctoral Research Fellowship, USA fundings at SRI-International, Speech Technology and Research (STAR) Lab., Menlo Park, CA, USA and International Computer Science Institute (ICSI) Speech Group, University of California at Berkeley, CA, USA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies. The authors thank Gokhan Tur, Dilek Hakkani- Tur, Benoit Favre, Sebastien Cuendet, Murat Saraclar, Siddika Parlak, Erinc Dikici, Izel D. Revidi, Cenk Demiroglu and Fatih Ozaydin and Bogazici University Signal and Image Processing (BUSIM) Group for many helpful discussions. The authors also thank the anonymous reviewers for their useful comments on an earlier version of this paper.en_US
dc.language.isoengen_US
dc.publisherElsevier Science BVen_US
dc.relation.isversionof10.1016/j.patrec.2017.10.010
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectMachine learningen_US
dc.subjectMulti-view semi-supervised learningen_US
dc.subjectCo-trainingen_US
dc.subjectSentence segmentationen_US
dc.subjectBoostingen_US
dc.subjectSpeechen_US
dc.subjectRecognitionen_US
dc.subjectMultiviewen_US
dc.subjectSpeech recognitionen_US
dc.subjectSentence boundaryen_US
dc.subjectAdaptive boostingen_US
dc.subjectArtificial intelligenceen_US
dc.subjectClassification (of information)en_US
dc.subjectLearning algorithmsen_US
dc.subjectLearning systemsen_US
dc.subjectSpeech processingen_US
dc.subjectAutomatic speech recognizersen_US
dc.subjectMorphological informationen_US
dc.subjectMulti-view learningen_US
dc.subjectSemi- supervised learningen_US
dc.subjectSentence boundariesen_US
dc.subjectSupervised learningen_US
dc.titleEffective semi-supervised learning strategies for automatic sentence segmentationen_US
dc.typearticleen_US
dc.description.versionPublisher's Versionen_US
dc.relation.journalPattern Recognition Lettersen_US
dc.contributor.departmentIşık Üniversitesi, Mühendislik Fakültesi, Elektrik-Elektronik Mühendisliği Bölümüen_US
dc.contributor.departmentIşık University, Faculty of Engineering, Department of Electrical-Electronics Engineeringen_US
dc.contributor.authorID0000-0002-4597-0954
dc.contributor.authorID0000-0002-7008-4778
dc.identifier.volume105
dc.identifier.issueSI
dc.identifier.startpage76
dc.identifier.endpage86
dc.peerreviewedYesen_US
dc.publicationstatusPublisheden_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.contributor.institutionauthorDalva, Doğanen_US
dc.contributor.institutionauthorGüz, Ümiten_US
dc.contributor.institutionauthorGürkan, Hakanen_US
dc.relation.indexWOSen_US
dc.relation.indexScopusen_US
dc.relation.indexScience Citation Index Expanded (SCI-EXPANDED)en_US
dc.description.qualityQ2
dc.description.wosidWOS:000428363000010


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster