Object recognition with competitive convolutional neural networks

Erkoç, Tuğba

dc.contributor.advisor	Eskil, M. Taner	en_US
dc.contributor.author	Erkoç, Tuğba	en_US
dc.contributor.other	Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programı	en_US
dc.date.accessioned	2023-08-03T12:23:10Z
dc.date.available	2023-08-03T12:23:10Z
dc.date.issued	2023-06-12
dc.identifier.citation	Erkoç, T. (2023). Object recognition with competitive convolutional neural networks. İstanbul: Işık Üniversitesi Lisansüstü Eğitim Enstitüsü.	en_US
dc.identifier.uri	https://hdl.handle.net/11729/5655
dc.description	Text in English ; Abstract: English and Turkish	en_US
dc.description	Includes bibliographical references (leaves 82-90)	en_US
dc.description	xvii, 91 leaves	en_US
dc.description.abstract	In recent years, Artificial Intelligence (AI) has achieved impressive results, often surpassing human capabilities in tasks involving language comprehension and visual recognition. Among these, computer vision has experienced remarkable progress, largely due to the introduction of Convolutional Neural Networks (CNNs). CNNs are inspired by the hierarchical structure of the visual cortex and are designed to detect patterns, objects, and complex relationships within visual data. One key advantage is their ability to learn directly from pixel values without the need for domain expertise, which has contributed to their popularity. These networks are trained using supervised backpropagation, a process that calculates gradients of the network’s parameters (weights and biases) with respect to the loss function. While backpropagation enables impressive performance with CNNs, it also presents certain drawbacks. One such drawback is the requirement for large amounts of labeled data. When the available data samples are limited, the gradients estimated from this limited information may not accurately capture the overall data behavior, leading to suboptimal parameter updates. However, obtaining a sufficient quantity of labeled data poses a challenge. Another drawback is the requirement of careful configuration of hyperparameters, including the number of neurons, learning rate, and network architecture. Finding optimal values for these hyperparameters can be a time-consuming process. Furthermore, as the complexity of the task increases, the network architecture becomes deeper and more complex. To effectively train the shallow layers of the network, one must increase the number of epochs and experiment with solutions to prevent vanishing gradients. Complex problems often require a greater number of epochs to learn the intricate patterns and features present in the data. It’s important to note that while CNNs aim to mimic the structure of the visual cortex, the brain’s learning mechanism does not necessarily involve back-propagation. Although CNNs incorporate the layered architecture of the visual cortex, the reliance on backpropagation introduces an artificial learning procedure that may not align with the brain’s actual learning process. Therefore, it is crucial to explore alternative learning paradigms that do not rely on backpropagation. In this dissertation study, a unique approach to unsupervised training for CNNs is explored, setting it apart from previous research. Unlike other unsupervised methods, the proposed approach eliminates the reliance on backpropagation for training the filters. Instead, we introduce a filter extraction algorithm capable of extracting dataset features by processing images only once, without requiring data labels or backward error updates. This approach operates on individual convolutional layers, gradually constructing them by discovering filters. To evaluate the effectiveness of this backpropagation-free algorithm, we design four distinct CNN architectures and conduct experiments. The results demonstrate the promising performance of training without backpropagation, achieving impressive classification accuracies on different datasets. Notably, these outcomes are attained using a single network setup without any data augmentation. Additionally, our study reveals that the proposed algorithm eliminates the need to predefine the number of filters per convolutional layer, as the algorithm automatically determines this value. Furthermore, we demonstrate that filter initialization from a random distribution is unnecessary when backpropagation is not employed during training.	en_US
dc.description.abstract	Son yıllarda Yapay Zeka (YZ) dili anlama ve görsel tanımayı içeren görevlerde genellikle insan yeteneklerini geride bırakarak etkileyici sonuçlar elde etti. Bunların arasında, bilgisayarla görme, büyük ölçüde Evrişimli Sinir Ağlarının (ESA) ortaya çıkması ile dikkate değer bir ilerleme kaydetti. ESAlar, görsel korteksin hiyerarşik yapısından ilham alarak görsel verilerdeki kalıpları, nesneleri ve karmaşık ilişkileri tespit etmek icin tasarlanmıştır. En önemli avantajlarından biri, popülerliklerine katkıda bulunan, bir uzmana ihtiya. Duymadan doğrudan piksel değerlerinden öğrenme yetenekleridir. Bu ağlar, kayıp fonksiyonuna göre ağ parametrelerinin (ağrılıklar ve eğilimler) gradyanlarını hesaplayan denetimli geri yayılım ile eğitilir. Geri yayılım, ESAlarda etkileyici bir performans sağlarken, bazı dezavantajlar da getirir. Bu dezavantajlardan biri büyük miktarlarda etiketlenmiş veri gereksinimidir. Mevcut veri örnekleri sınırlı olduğunda, bu sınırlı bilgiden hesaplanan gradyanlar , genel veri davranışını doğru bir şekilde yakalayamayabilir ve bu da yetersiz parameter güncellemelerine yol açar. Bununla birlikte, yeterli miktarda etiketlenmiş veri elde etmek bir zorluk teşkil etmektedir. Diğer nir dezavantaj nöron sayısı, öğrenme hızı ve ağ mimarisi dahil olmak üzere hiperparametrelerin dikkatli bir şekilde yapılandırılması gerekliliğidir. Bu hiperparametreler için en uygun değerleri bulmak zaman alıcı bir süreç olabilir. Ayrıca, görevin karmaşıklığı arttıkça ağ mimarisi daha derin ve karmaşık bir hale gelir. Ağın sığ katmanlarını etkili bir şekilde eğitmek için, epok sayısı artırılmalı ve kaybolan gradyanları önlemek için çözümler üretilmelidir. Karmaşık problemler, verilerde bulunan karmaşık kalıpları ve özellikleri öğrenmek için genellikle daha fazla sayıda epok gerektirir. ESAlar görsel korteksin yapısını taklit etmeyi amaçlasa da, beynin öğrenme mekanizmasının mutlaka geri yayılımı içermediğini not etmek önemlidir. ESAlar görsel korteksin katmanlı mimarisini içermelerine rağmen, geri yayılıma dayanan öğrenme, beyningerçek öğrenme süreciyle uyumlu olmayabilen yapay bir öğrenme prosedürü sunar. Bu nedenle, geri yayılıma dayanmayan alternatif öğrenme paradigmalarını keşfetmek önem teşkil etmektedir. Bu tez çalışmasında, önceki araştırmalardan farklı olarak ESAlar için denetimsiz eğitime yönelik benzersiz bir yaklaşım araştırılmaktadır. Önerilen yaklaşım diğer denetimsiz yöntemlerin aksine, filtrelerin eğitimi için geri yayılmaya olan bağlılığı kaldırır. Geri yayılım ile öğrenme yerine, veri etiketleri veya geriye dönük hata güncellemeleri gerektirmeden görüntüleri yalnızca bir kez işleyerek veri kümesi özelliklerini çıkarabilen bir filtre çıkarma algoritması sunuyoruz. Bu yaklaşım bireysel Evrişimli katmanlar üzerinde çalışır ve filtreleri eğitim örnekleri üzerinden keşfederek evrişim katmanının filtrelerini kademeli olarak oluşturur. Bu geri yayılımsız algoritmanın etkinliğini değerlendirmek için dört farklı ESA mimarisi tasarladık ve deneyler yaptık. Sonuçlar, farklı veri kümelerinde etkileyici sınıflandırma doğrulukları elde ederek, geri yayılım olmadan eğitimin mümkün olabileceğini göstermektedir. Özellikle, bu sonuçlara herhangi bir veri arttırımı olmadan vet ek bir ağ kullanılarak ulaşılmıştır. Ek olarak, çalışmamızda önerilen algoritma, evrişim katmanı başına filtre sayısını önceden belirleme ihtiyacını ortadan kaldırmaktadır çünkü algoritmamız bu değeri otomatik olarak belirlemektedir. Ayrıca, eğitim sırasında geri yayılım kullanılmadığından rastgele bir dağılımdan filtrelere ilkdeğer verilmesinin gereksiz olduğunu da bu çalışma ile gösterdik.	en_US
dc.description.tableofcontents	INTRODUCTION	en_US
dc.description.tableofcontents	Contributions	en_US
dc.description.tableofcontents	Organization of This Thesis	en_US
dc.description.tableofcontents	CONVOLUTIONAL NEURAL NETWORKS	en_US
dc.description.tableofcontents	Convolutional Neural Network Architecture	en_US
dc.description.tableofcontents	Convolutional Layer	en_US
dc.description.tableofcontents	Activation Function	en_US
dc.description.tableofcontents	Convolution Operation	en_US
dc.description.tableofcontents	Pooling Layer	en_US
dc.description.tableofcontents	Fully Connected Layer	en_US
dc.description.tableofcontents	Output Layer	en_US
dc.description.tableofcontents	Training of CNN	en_US
dc.description.tableofcontents	LITERATURE SURVEY	en_US
dc.description.tableofcontents	Initial Steps	en_US
dc.description.tableofcontents	Backpropagation Era	en_US
dc.description.tableofcontents	Fundamental Deep Learning Problem	en_US
dc.description.tableofcontents	Revival of the Neural Networks Research	en_US
dc.description.tableofcontents	GPU Era	en_US
dc.description.tableofcontents	Unsupervised Learning with Backpropagation	en_US
dc.description.tableofcontents	Neocognitron	en_US
dc.description.tableofcontents	APPROACH	en_US
dc.description.tableofcontents	Convolutional Filter Discovery	en_US
dc.description.tableofcontents	Center of Gravity Based Candidate Filter Extraction	en_US
dc.description.tableofcontents	Unsupervised Learning Algorithm for Convolutional Layers of CCNN Architecture	en_US
dc.description.tableofcontents	EXPERIMENTS	en_US
dc.description.tableofcontents	Model Types	en_US
dc.description.tableofcontents	Experiment Setup	en_US
dc.description.tableofcontents	Datasets	en_US
dc.description.tableofcontents	MNIST	en_US
dc.description.tableofcontents	EMNIST-Digits	en_US
dc.description.tableofcontents	Kuzushiji-MNIST	en_US
dc.description.tableofcontents	Fashion-MNIST	en_US
dc.description.tableofcontents	Performance Metrics	en_US
dc.description.tableofcontents	Experiment Details	en_US
dc.description.tableofcontents	RESULTS	en_US
dc.description.tableofcontents	MNIST Experiment Results	en_US
dc.description.tableofcontents	EMNIST-Digits Experiment Results	en_US
dc.description.tableofcontents	Kuzushiji-MNIST Experiment Results	en_US
dc.description.tableofcontents	Fashion-MNIST Experiment Results	en_US
dc.description.tableofcontents	Filters Discovered via Proposed Unsupervised Process	en_US
dc.description.tableofcontents	MNIST Dataset	en_US
dc.description.tableofcontents	EMNIST-Digits Dataset	en_US
dc.description.tableofcontents	Kuzushiji-MNIST Dataset	en_US
dc.description.tableofcontents	Fashion-MNIST Dataset	en_US
dc.description.tableofcontents	Extracted Filters in Subsequent Layers	en_US
dc.description.tableofcontents	Samples with Incorrect Classification	en_US
dc.description.tableofcontents	Incorrectly Classified MNIST Samples	en_US
dc.description.tableofcontents	Incorrectly Classified EMNIST-Digits Samples	en_US
dc.description.tableofcontents	Incorrectly Classified Kuzushiji-MNIST Samples	en_US
dc.description.tableofcontents	Incorrectly Classified Fashion-MNIST Samples	en_US
dc.description.tableofcontents	DISCUSSION	en_US
dc.description.tableofcontents	Comparison of Performance Against Other Studies	en_US
dc.description.tableofcontents	Comparison of Performance Against Unsupervised Studies	en_US
dc.description.tableofcontents	Comparison of Performance Against Mixed Studies	en_US
dc.description.tableofcontents	Comparison of Performance Against Supervised Studies	en_US
dc.description.tableofcontents	Proof of Linear Independence of the Extracted Filters	en_US
dc.description.tableofcontents	Proof of Independence over the Order of Candidate Processing for Filter Extraction	en_US
dc.description.tableofcontents	Comparison to Low-Capacity CNN	en_US
dc.description.tableofcontents	CCNN networks that are used in the experiments with various datasets. Convolutional layers either use 5 × 5 or 3 × 3 filters. Maxpooling is applied on the feature maps on 2 × 2 windows with strides of 2. The size of the convolutional filters is denoted with n while the maxpooling window size is shown with m	en_US
dc.description.tableofcontents	Extracted filter counts and the test accuracy of individual models on MNIST dataset	en_US
dc.description.tableofcontents	The confusion matrix represents the performance of Model A on the MNIST dataset	en_US
dc.description.tableofcontents	Performance metrics of Model type A for individual classes of MNIST dataset	en_US
dc.description.tableofcontents	Extracted filter counts and the test accuracy of individual models on EMNIST-Digits dataset	en_US
dc.description.tableofcontents	The confusion matrix represents the performance of Model A on the EMNIST-Digits dataset	en_US
dc.description.tableofcontents	Performance metrics of Model type A for individual classes of EMNISTDigits dataset	en_US
dc.description.tableofcontents	Extracted filter counts and the test accuracy of individual models on Kuzushiji-MNIST dataset	en_US
dc.description.tableofcontents	The confusion matrix represents the performance of Model B on the Kuzushiji-MNIST dataset	en_US
dc.description.tableofcontents	Performance metrics of Model type B for individual classes of KuzushijiMNIST dataset	en_US
dc.description.tableofcontents	Extracted filter counts and the test accuracy of individual models on Fashion-MNIST dataset	en_US
dc.description.tableofcontents	The confusion matrix represents the performance of Model B on the Fashion-MNIST dataset. The classes are assigned to numbers ranging from 0 to 9. In order, the class labels correspond to Tshirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, and Ankle boot	en_US
dc.description.tableofcontents	Performance metrics of Model type B for individual classes of FashionMNIST dataset	en_US
dc.description.tableofcontents	Comparison between previous works and our method for the number of epochs of training needed for convolutional filters, whether data augmentation and ensemble of networks are used. The legend of the table: ✓: applied, × : not applied, NA: no information available	en_US
dc.description.tableofcontents	Comparison of the proposed method with other unsupervised studies	en_US
dc.description.tableofcontents	Comparison of the proposed method with other mixed studies	en_US
dc.description.tableofcontents	Comparison of the proposed method with other supervised studies	en_US
dc.description.tableofcontents	Comparison of best performing model filter counts and test accuracy before and after addition of candidate shuffling. The median of the 50 runs of the experiments is also presented	en_US
dc.description.tableofcontents	A typical Convolutional Neural Network	en_US
dc.description.tableofcontents	CNNs can recognize high level concepts like face by hierarchically building feature detectors starting from basic edge like shapes to complex features like eyes	en_US
dc.description.tableofcontents	Sigmoid, hyperbolic tangent and ReLU activation function curves shown. ReLU is most popular activation function in CNNs	en_US
dc.description.tableofcontents	PReLU and Leaky ReLU activation function curves. PReLU and Leaky ReLU allows a small gradient for negative values whereas original ReLU strictly sets the negative values to zero	en_US
dc.description.tableofcontents	Visualization of convolution operation on with a 5 × 5 image and a 3 × 3 filter with a stride of 1 pixel	en_US
dc.description.tableofcontents	Visualization of convolution operation on with a 5 × 5 image and a 3 × 3 filter with a stride of 1 pixel and zero padding of 1 pixel as per Equation 2.2. Green background is padding while the image data is shown with light blue background	en_US
dc.description.tableofcontents	Convolution layer applies convolution operation to the input images. The feature maps are then introduced to non-linearity with activation function	en_US
dc.description.tableofcontents	Visualization of max pooling operation on with a 4 × 4 image with 2 × 2 window size and strides of 2	en_US
dc.description.tableofcontents	Fully connected layers learn the relations between the high-level patterns	en_US
dc.description.tableofcontents	Single neuron in Fully Connected Layer	en_US
dc.description.tableofcontents	Neurons dropped out with dropout regularization technique do not receive or transmit signal. Dropout with probability p = 0.5 is applied to the neurons	en_US
dc.description.tableofcontents	Output Layer in a CNN	en_US
dc.description.tableofcontents	The proposed unsupervised backpropagationless filter extraction method. Images/feature maps are converted to candidates from which the filters are discovered without label information. Any filter candidate �� can become a new filter for the current layer if the maximum similarity value is less than a preset threshold. If not, filter with the highest similarity’s weights is updated	en_US
dc.description.tableofcontents	First layer filters of Model A trained with MNIST dataset	en_US
dc.description.tableofcontents	First layer filters of Model B trained with Kuzushiji-MNIST dataset	en_US
dc.description.tableofcontents	First layer filters of Model A trained with EMNIST-Digits dataset	en_US
dc.description.tableofcontents	First layer filters of Model B trained with Fashion-MNIST dataset	en_US
dc.description.tableofcontents	The visualization illustrates the collection of 54 features obtained from the MNIST training images using Model type A in the second layer	en_US
dc.description.tableofcontents	The test images belonging to digit class 1 from the MNIST dataset are inaccurately classified by Model A. Among these images, the second, fourth, and fifth samples are mistakenly labeled as 6, potentially due to the presence of artifacts and curved elements within the images	en_US
dc.description.tableofcontents	The test images belonging to digit class 9 from the MNIST dataset are inaccurately labeled by Model A	en_US
dc.description.tableofcontents	Misclassified images from the digit class 6 in the EMNIST-Digits dataset, as predicted by Model A	en_US
dc.description.tableofcontents	The test images belonging to digit class 8 from the EMNIST-Digits dataset are inaccurately labeled by Model A	en_US
dc.description.tableofcontents	Misclassified images from the class 3 in the Kuzushiji-MNIST dataset, as predicted by Model B	en_US
dc.description.tableofcontents	The test images belonging to digit class 2 from the Kuzushiji-MNIST dataset are inaccurately labeled by Model B	en_US
dc.description.tableofcontents	The test images belonging to the Bag class that were misclassified	en_US
dc.description.tableofcontents	The mislabeled test images from the Shirt class, which were incorrectly classified as similar classes by Model B	en_US
dc.description.tableofcontents	Boxplot of the number of filters extracted from MNIST dataset for both layers of Model A with candidate shuffling	en_US
dc.description.tableofcontents	Boxplot of the number of filters extracted from EMNIST-Digits dataset for both layers of Model A with candidate shuffling	en_US
dc.description.tableofcontents	Boxplot of the number of filters extracted from Kuzushiji-MNIST dataset for both layers of Model B with candidate shuffling	en_US
dc.description.tableofcontents	Boxplot of the number of filters extracted from Fashion MNIST dataset for both layers of Model B with candidate shuffling	en_US
dc.description.tableofcontents	Boxplot of the test accuracy distribution of Model A over 50 runs on MNIST dataset with candidate shuffling	en_US
dc.description.tableofcontents	Boxplot of the test accuracy distribution of Model A over 50 runs on EMNIST-Digits dataset with candidate shuffling	en_US
dc.description.tableofcontents	Boxplot of the test accuracy distribution of Model B over 50 runs on Kuzushiji-MNIST dataset with candidate shuffling	en_US
dc.description.tableofcontents	Boxplot of the test accuracy distribution of Model B over 50 runs on Fashion MNIST dataset with candidate shuffling	en_US
dc.description.tableofcontents	Candidate Set Creation Process	en_US
dc.description.tableofcontents	CCNN CoG Based Unsupervised Learning Algorithm	en_US
dc.language.iso	en	en_US
dc.publisher	Işık Üniversitesi	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Convolutional neural networks	en_US
dc.subject	Unsupervised learning	en_US
dc.subject	Feature extraction	en_US
dc.subject	Evrişimli sinir ağları	en_US
dc.subject	Denetimsiz öğrenme	en_US
dc.subject	Özellik çıkarma	en_US
dc.subject.lcc	QA76.87 .E75 O25 2023
dc.subject.lcsh	Convolutional neural networks.	en_US
dc.subject.lcsh	Unsupervised learning.	en_US
dc.subject.lcsh	Feature extraction.	en_US
dc.title	Object recognition with competitive convolutional neural networks	en_US
dc.title.alternative	Rekabetçi evrişimli sinir ağları ile nesne tanıma	en_US
dc.type	Doctoral Thesis	en_US
dc.department	Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programı	en_US
dc.authorid	0000-0001-9033-8934
dc.authorid	0000-0001-9033-8934	en_US
dc.relation.publicationcategory	Tez	en_US
dc.institutionauthor	Erkoç, Tuğba	en_US