Arama Sonuçları

Listeleniyor 1 - 10 / 12
  • Yayın
    Calculating the VC-dimension of decision trees
    (IEEE, 2009) Aslan, Özlem; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.
  • Yayın
    Maintenance policy analysis of the regenerative air heater system using factored POMDPs
    (Elsevier Ltd, 2022-03) Kıvanç, İpek; Özgür Ünlüakın, Demet; Bilgiç, Taner
    Maintenance optimization of multi-component systems is a difficult problem. Partially Observable Markov Decision Processes (POMDPs) are powerful tools for such problems under uncertainty in stochastic environments. In this study, the main POMDP solution approaches and solvers are surveyed. Then, based on experimental models with different complexities in the size of the system space, selected POMDP solvers using different representation patterns for modeling and different procedures for updating the value function while solving are compared. Furthermore, to show that factored representations are advantageous in modeling and solving the maintenance problem of multi-component systems where there exist also stochastic dependencies among the components, the maintenance problem of the one-line regenerative air heater system available in thermal power plants is modeled and solved with factored POMDPs. In-depth sensitivity analyses are performed on the obtained policy. The results show that factored POMDPs enable compact modeling, efficient policy generation and practical policy analysis for the tackled problem. Furthermore, they also motivate the use of factored POMDPs in the generation and analysis of maintenance policies for similar multi-component systems.
  • Yayın
    İlişkisel veri tabanlarında mükerrer kayıtların makine öğrenmesiyle tespiti
    (Institute of Electrical and Electronics Engineers Inc., 2018-07-05) Bayrak, Ahmet Tuğrul; Yılmaz, Aykut İnan; Yılmaz, Kemal Burak; Düzağaç, Remzi; Yıldız, Olcay Taner
    Veri miktarının artışına paralel olarak, ilişkisel veri tabanlarında mükerrer kayıtlar da artmaktadır. Artan bu kayıtlar kullanıldıkları rapor veya analizlerde tutarsızlığa sebep olabilmektedir. Bu sorunu en aza indirgemek için yaptığımız çalışmada, kayıtların birbirlerine olan benzerlikleri ve alan uzmanlık bilgisiyle belirlenen ağırlıklar, öznitelik olarak kullanılarak makine öğrenmesi algoritmaları ile mükerrer kayıtların bulunması hedeflenmiştir. Yapılan işlem sonucunda 9301467 satır veride 28412 mükerrer çift tespit edilmiştir. Bulunan bu mükerrer kayıtlar veri kaynağından temizlenerek verinin daha tutarlı hale gelmesi sağlanmaktadır.
  • Yayın
    Effective semi-supervised learning strategies for automatic sentence segmentation
    (Elsevier Science BV, 2018-04-01) Dalva, Doğan; Güz, Ümit; Gürkan, Hakan
    The primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.
  • Yayın
    Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation
    (IEEE, 2018) Dalva, Doğan; Güz, Ümit; Gürkan, Hakan
    The objective of this work is to develop effective multi-view semi-supervised machine learning strategies for sentence boundary classification problem when only small sets of sentence boundary labeled data are available. We propose three-view and committee-based learning strategies incorporating with co-training algorithms with agreement, disagreement, and self-combined learning strategies using prosodic, lexical and morphological information. We compare experimental results of proposed three-view and committee-based learning strategies to other semi-supervised learning strategies in the literature namely, self-training and co-training with agreement, disagreement, and self-combined strategies. The experiment results show that sentence segmentation performance can be highly improved using multi-view learning strategies that we propose since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average performance when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.
  • Yayın
    Cost-conscious comparison of supervised learning algorithms over multiple data sets
    (Elsevier Sci Ltd, 2012-04) Ulaş, Aydın; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    In the literature, there exist statistical tests to compare supervised learning algorithms on multiple data sets in terms of accuracy but they do not always generate an ordering. We propose Multi(2)Test, a generalization of our previous work, for ordering multiple learning algorithms on multiple data sets from "best" to "worst" where our goodness measure is composed of a prior cost term additional to generalization error. Our simulations show that Multi2Test generates orderings using pairwise tests on error and different types of cost using time and space complexity of the learning algorithms.
  • Yayın
    Mapping classifiers and datasets
    (Pergamon-Elsevier Science Ltd, 2011-04) Yıldız, Olcay Taner
    Given the posterior probability estimates of 14 classifiers on 38 datasets, we plot two-dimensional maps of classifiers and datasets using principal component analysis (PCA) and Isomap. The similarity between classifiers indicate correlation (or diversity) between them and can be used in deciding whether to include both in an ensemble. Similarly, datasets which are too similar need not both be used in a general comparison experiment. The results show that (i) most of the datasets (approximately two third) we used are similar to each other, (ii) multilayer perceptrons and k-nearest neighbor variants are more similar to each other than support vector machine and decision tree variants. (iii) the number of classes and the sample size has an effect on similarity.
  • Yayın
    Shallow parsing in Turkish
    (IEEE, 2017) Topsakal, Ozan; Açıkgöz, Onur; Gürkan, Ali Tunca; Kanburoğlu, Ali Buğra; Ertopçu, Burak; Özenç, Berke; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay Taner
    In this study, shallow parsing is applied on Turkish sentences. These sentences are used to train and test the per-formances of various learning algorithms with various features specified for shallow parsing in Turkish.
  • Yayın
    Aynı oteli temsil eden farklı kayıtlar için akıllı eşleştirme
    (Institute of Electrical and Electronics Engineers Inc., 2019-09) Bayrak, Ahmet Tuğrul; Özbek, Eyüp Erkan; Kestepe, Sedat; Yıldız, Olcay Taner
    Otel sayısının her geçen gün arttığı turizm sektöründe, aracı firmaların tüm oteller ile ayrı ayrı çalışma imkanı bulunmadığından, firmalar dünya üzerinde bir çok otelle anlaşması bulunan servis sağlayıcılarıyla beraber çalışmaktadır. Farklı servis sağlayıcılarından alınan otel kayıtlarında tekrarlayan otel verileri olabilmektedir. Tekrarlayan bu kayıtlar aynı bilgilere sahip olabileceği gibi, farklı bilgilere sahip olmasına rağmen aynı oteli temsil edebilmektedir. Otel verilerini tutarlı hale getirmek için aynı oteli temsil eden kayıtlar eşleştirilmelidir. Bu amaçla, otel kayıtları üzerinde çalışılarak, adres zenginleştirmesi ve ön işleme yapılan aday kayıtlar için kategorik ve görsel verilerin benzerliklerinin kullanıldığı makine öğrenmesi algoritmaları uygulanmıştır. Yapılan işlem sonucunda, 132.287 satırlık otel verisinde 14.985 adet otel %99,12 doğruluk oranı ile eşleştirilmiştir.
  • Yayın
    Machine learning-based model categorization using textual and structural features
    (Springer Science and Business Media Deutschland GmbH, 2022-09-08) Khalilipour, Alireza; Bozyiğit, Fatma; Utku, Can; Challenger, Moharram
    Model Driven Engineering (MDE), where models are the core elements in the entire life cycle from the specification to maintenance phases, is one of the promising techniques to provide abstraction and automation. However, model management is another challenging issue due to the increasing number of models, their size, and their structural complexity. So that the available models should be organized by modelers to be reused and overcome the development of the new and more complex models with less cost and effort. In this direction, many studies are conducted to categorize models automatically. However, most of the studies focus either on the textual data or structural information in the intelligent model management, leading to less precision in the model management activities. Therefore, we utilized a model classification using baseline machine learning approaches on a dataset including 555 Ecore metamodels through hybrid feature vectors including both textual and structural information. In the proposed approach, first, the textual information of each model has been summarized in its elements through text processing as well as the ontology of synonyms within a specific domain. Then, the performances of machine learning classifiers were observed on two different variants of the datasets. The first variant includes only textual features (represented both in TF-IDF and word2vec representations), whereas the second variant consists of the determined structural features and textual features. It was finally concluded that each experimented machine learning algorithm gave more successful prediction performance on the variant containing structural features. The presented model yields promising results for the model classification task with a classification accuracy of 89.16%.