Arama Sonuçları

Listeleniyor 1 - 6 / 6
  • Yayın
    A robust Gradient boosting model based on SMOTE and NEAR MISS methods for intrusion detection in imbalanced data sets
    (Işık Üniversitesi, 2022-01-18) Arık, Ahmet Okan; Çavdaroğlu Akkoç, Gülsüm Çiğdem; Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Enformasyon Teknolojileri Yüksek Lisans Programı
    Novel technologies cause many security vulnerabilities and zero-day attack risks. Intrusion Detection Systems (IDS) are developed to protect computer networks from threats and attacks. Many challenging problems need to be solved in existing methods. The class imbalance problem is one of the most difficult problems of IDS, and it reduces the detection rate performance of the classifiers. The highest IDS detection rate in the literature is 96.54%. This thesis proposes a new model called ROGONG-IDS (Robust Gradient Boosting) based on Gradient Boosting. ROGONGIDS model uses Synthetic Minority Over-Sampling Technique (SMOTE) and Near Miss methods to handle class imbalance. Three different gradient boosting-based classification algorithms (GBM, LightGBM, XGBoost) were compared. The performance of the proposed model on multiclass classification has been verified in the UNSW-NB15 dataset. It reached the highest attack detection rate and F1 score in the literature with a 97.30% detection rate and 97.65% F1 score. ROGONG-IDS provides a robust, efficient solution for IDS built on datasets with the imbalanced class distribution. It outperforms state-of-the-art and traditional intrusion detection methods.
  • Yayın
    Rule based entity-relationship diagram modelling
    (Işık Üniversitesi, 2022-02-07) Ulusoy, Oğuzhan; Ekin, Emine; Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Yüksek Lisans Programı
    Modern society needs to use database system since they involve many activities that are related to database interaction directly. In this study, entity-relationship modeling using Natural Language Processing techniques is presented for the English language. Natural Language Processing refers to the capability of understanding human languages naturally, like Turkish and English, using computational power. To make this possible, combination of linguistics and current Machine Learning systems are used together. Entity-Relationship diagrams ensure to plan or trace relational databases in different fields. In the beginning, all details of a standard database management and its components have been studied. Heuristic rules which indicate the relation between human language and database components have been defined. According to the defined heuristic rules previously, an event-based pipeline has been constructed. A full text has been analyzed and processed every word at this pipeline using Natural Language Processing techniques.
  • Yayın
    Co-training using prosodic, lexical and morphological information for automatic sentence segmentation of Turkish spoken language
    (Işık Üniversitesi, 2018-01-15) Dalva, Doğan; Güz, Ümit; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Elektronik Mühendisliği Doktora Programı
    Sentence segmentation of speech aims detecting sentence boundaries in a stream of words output by the speech recognizer. Sentence segmentation is a preliminary step toward speech understanding. It is of particular importance for speech related applications, as most of the further processing steps; such as parsing, machine translation and information extraction, assume the presence of sentence boundaries. Typically, statistical methods require a huge amount of manually labeled data, which is time and labor consuming process to prepare. In this work, novel multiview semi-supervised learning strategies for the solution of sentence segmentation problem are proposed. The aim of this work is to and effective semi-supervised machine learning strategies when only a small set of sentence boundary labeled data is available. This work proposes three-view co-training and committee-based strategies incorporating with agreement, disagreement and self-combined strategies using lexical, morphological and prosodic information, and investigates performance of the proposed learning strategies against baseline, self-training and co-training. The experimental results show that the proposed learning strategies highly improve the sentence segmentation problem, since data sets can be represented by three redundantly suffcient and disjoint feature sets.
  • Yayın
    Word sense disambiguation, named entity recognition, and shallow parsing tasks for Turkish
    (Işık Üniversitesi, 2019-04-02) Topsakal, Ozan; Yıldız, Olcay Taner; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Yüksek Lisans Programı
    People interactions are based on sentences. The process of understanding sentences is thru converging, parsing the words and making sense of words. The ultimate goal of Natural Language Processing is to understand the meaning of sentences. There are three main areas that are the topics of this thesis, namely, Named Entity Recognition, Shallow Parsing, and Word Sense Disambiguation. The Natural Language Processing algorithms that learn entities, like person, location, time etc. are called Named Entity Recognition algorithms. Parsing sentences is one of the biggest challenges in Natural Language Processing. Since time efficiency and accuracy are inversely proportional with each other, one of the best ideas is to use shallow parsing algorithms to deal with this challenge. Many of words have more than one meaning. Recognizing the correct meaning that is used in a sentence is a difficult problem. In Word Sense Disambiguation literature there are lots of algorithms that can help to solve this problem. This thesis tries to find solutions to these three challenges by applying machine learning trained algorithms. Experiments are done on a dataset, containing 9,557 sentences.
  • Yayın
    An intrusion detection approach based on the combination of oversampling and undersampling algorithms
    (Istanbul University Press, 2023-06-14) Arık, Ahmet Okan; Çavdaroğlu, Gülsüm Çiğdem
    The threat of network intrusion has become much more severe due to the increasing network flow. Therefore, network intrusion detection is one of the most concerned areas of network security. As demand for cybersecurity assurance increases, the requirement for intrusion detection systems to meet current threats is also growing. However, network-based intrusion detection systems have several shortcomings due to the structure of the systems, the nature of the network data, and uncertainty related to future data. The imbalanced class problem is also crucial since it significantly negatively affects classification performance. Although high performance has been achieved in deep learning-based methodologies in recent years, machine learning techniques may also provide high performance in network intrusion detection. This study suggests a new intrusion detection system called ROGONG-IDS (Robust Gradient Boosting – Intrusion Detection System) which has a unique two-stage resampling model to solve the imbalanced class problem that produces high accuracy on the UNSW-NB15 dataset using machine learning techniques. ROGONGIDS is based on gradient boosting. The system uses Synthetic Minority Over-Sampling Technique (SMOTE) and NearMiss-1 methods to handle the imbalanced class problem. The proposed model's performance on multi-class classification was tested with the UNSW-NB15, and then its robust structure was validated with the NSL-KDD dataset. ROGONG-IDS reached the highest attack detection rate and F1 score in the literature, with a 97.30% detection rate and 97.65% F1 score using the UNSW-NB15 dataset. ROGONG-IDS provides a robust, efficient intrusion detection system for the UNSW-NB15 dataset, which suffered from imbalanced class distribution. The proposed methodology outperforms state-of-the-art and intrusion detection methods.
  • Yayın
    An analysis on environmental justice and air quality using machine learning techniques
    (Murat Gök, 2025-12-24) Demircan, Görkem; Çavdaroğlu Akkoç, Gülsüm Çiğdem
    This study examines air quality dynamics across countries using machine learning with a focus on environmental justice. Random Forest, Decision Tree, XGBoost, and Adaboost algorithms were applied for a 10-year air pollution forecast. XGBoost showed the best performance. Increases in pollutant levels are expected in Bhutan and North Korea, while improvements may occur in India, Pakistan, and Nepal. Significant air quality changes are projected in Laos, Indonesia, and North Korea. The study highlights inequalities in pollution exposure and emphasizes the need for targeted interventions.