Grammar or crammer? the role of morphology in distinguishing orthographically similar but semantically unrelated words

dc.authorid0000-0002-2782-8217
dc.authorid0000-0001-5838-4615
dc.contributor.authorErcan, Gökhanen_US
dc.contributor.authorYıldız, Olcay Taneren_US
dc.date.accessioned2025-08-28T12:11:39Z
dc.date.available2025-08-28T12:11:39Z
dc.date.issued2025
dc.departmentIşık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programıen_US
dc.departmentIşık University, School of Graduate Studies, Ph.D. in Computer Engineeringen_US
dc.description.abstractWe show that n-gram-based distributional models fail to distinguish unrelated words due to the noise in semantic spaces. This issue remains hidden in conventional benchmarks but becomes more pronounced when orthographic similarity is high. To highlight this problem, we introduce OSimUnr, a dataset of nearly one million English and Turkish word-pairs that are orthographically similar but semantically unrelated (e.g., grammar - crammer). These pairs are generated through a graph-based WordNet approach and morphological resources. We define two evaluation tasks - unrelatedness identification and relatedness classification - to test semantic models. Our experiments reveal that FastText, with default n-gram segmentation, performs poorly (below 5% accuracy) in identifying unrelated words. However, morphological segmentation overcomes this issue, boosting accuracy to 68% (English) and 71% (Turkish) without compromising performance on standard benchmarks (RareWords, MTurk771, MEN, AnlamVer). Furthermore, our results suggest that even state-of-the-art LLMs, including Llama 3.3 and GPT-4o-mini, may exhibit noise in their semantic spaces, particularly in highly synthetic languages such as Turkish. To ensure dataset quality, we leverage WordNet, MorphoLex, and NLTK, covering fully derivational morphology supporting atomic roots (e.g., '-co_here+ance+y' for 'coherency'), with 405 affixes in Turkish and 467 in English.en_US
dc.description.versionPublisher's Versionen_US
dc.identifier.citationErcan, G. & Yıldız, O. T. (2025). Grammar or crammer? the role of morphology in distinguishing orthographically similar but semantically unrelated words. IEEE Access, 13, 64412-64458. doi:https://doi.org/10.1109/ACCESS.2025.3557086en_US
dc.identifier.doi10.1109/ACCESS.2025.3557086
dc.identifier.endpage64458
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-105003289605
dc.identifier.scopusqualityQ1
dc.identifier.startpage64412
dc.identifier.urihttps://hdl.handle.net/11729/6675
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3557086
dc.identifier.volume13
dc.identifier.wosWOS:001470402900005
dc.identifier.wosqualityQ2
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScience Citation Index Expanded (SCI-EXPANDED)en_US
dc.institutionauthorErcan, Gökhanen_US
dc.institutionauthorid0000-0002-2782-8217
dc.language.isoenen_US
dc.peerreviewedYesen_US
dc.publicationstatusPublisheden_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartofIEEE Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Öğrencien_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDerivational morphologyen_US
dc.subjectDistributional semantic modelingen_US
dc.subjectLanguage resourceen_US
dc.subjectMorphological segmentationen_US
dc.subjectOrthographic similarityen_US
dc.subjectWord-relatednessen_US
dc.subjectWord-similarityen_US
dc.subjectEconomic and social effectsen_US
dc.subjectSemantic segmentationen_US
dc.subjectSemanticsen_US
dc.subjectDistributional semanticsen_US
dc.subjectSemantic modellingen_US
dc.subjectTurkishsen_US
dc.subjectModeling languagesen_US
dc.subjectNoiseen_US
dc.subjectMorphologyen_US
dc.subjectGrammaren_US
dc.subjectBenchmark testingen_US
dc.subjectAccuracyen_US
dc.subjectComputational modelingen_US
dc.subjectTrainingen_US
dc.subjectStatistical analysisen_US
dc.subjectHandsen_US
dc.titleGrammar or crammer? the role of morphology in distinguishing orthographically similar but semantically unrelated wordsen_US
dc.typeArticleen_US
dspace.entity.typePublicationen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Grammar_or_Crammer_The_Role_of_Morphology_in_Distinguishing_Orthographically_Similar_but_Semantically_Unrelated_Words_kopyası.pdf
Boyut:
11.44 MB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: