Self-supervised learning of 3D structure from 2D OCT slices for retinal disease diagnosis on UK biobank scans
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
This study presents a self-supervised learning framework for retinal disease classification using Optical Coherence Tomography (OCT) scans. To balance the contextual richness of 3D volumes with the computational efficiency of 2D architectures, we introduce a quasi-3D input generation strategy. Each input is constructed by stacking three OCT slices, sampled from channel-specific Gaussian distributions centered on the volume midplane, and arranged in a standard three-channel 2D format compatible with existing pre-trained models. These quasi-3D images are used to pre-train a Vision Transformer (ViT-Base) via a Masked Autoencoder (MAE) with a shared masking pattern, encouraging the model to reconstruct masked regions by encoding anatomical continuity across slices. Pre-training is conducted on 10,000 unlabeled OCT volumes from the UK Biobank. The encoder is then fine-tuned on the OCTA-500 dataset for three-class and four-class retinal disease classification tasks, including macular degeneration and diabetic retinopathy. The model achieves 92.57% accuracy on the three-class task, matching the performance of RETFound while using over 150 times less pre-training data and a smaller backbone.












