Paper
8 December 2023 Word sense disambiguation for low resource languages: setswana collocations
Boago Okgetheng, Gabofetswe Malema, Karabo Motlaleselelo
Author Affiliations +
Proceedings Volume 12943, International Workshop on Signal Processing and Machine Learning (WSPML 2023); 1294308 (2023) https://doi.org/10.1117/12.3014074
Event: International Workshop on Signal Processing and Machine Learning (WSPML 2023), 2023, Hangzhou, ZJ, China
Abstract
Word sense disambiguation (WSD) is a critical task in natural language processing (NLP) and artificial intelligence. Supervised methods, such as decision list algorithms, are considered the most accurate machine learning algorithms for WSD. However, they are strongly influenced by knowledge acquisition bottleneck, making their efficiency dependent on the size of the tagged training set, which can be difficult, time-consuming, and costly to prepare. In this paper, we developed a hierarchical decision list algorithm for low resource languages that are morphologically rich, using a statistical method for collocation extraction from a big untagged corpus. Our approach identifies the most important collocations, which are the features used to create learning hypotheses. We manually construct the decision list based on the priority of the senses, improving the efficiency and accuracy of the algorithm. Our experimentation is based on a dataset of 800 sentences, focusing on 20 Setswana polysemous words. Using precision and recall to test our WSD system, we achieved 78% accuracy compared to 50% accuracy of the existing decision list algorithm. Our method can be used for other resource-limited languages. The system poses challenges in accurately determining the appropriate sense of a word, especially when dealing with idiomatic expressions and ambiguous contexts. The proposed approach could be further enhanced by incorporating additional NLP applications, such as a morphological analyzer.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Boago Okgetheng, Gabofetswe Malema, and Karabo Motlaleselelo "Word sense disambiguation for low resource languages: setswana collocations", Proc. SPIE 12943, International Workshop on Signal Processing and Machine Learning (WSPML 2023), 1294308 (8 December 2023); https://doi.org/10.1117/12.3014074
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Machine learning

Evolutionary algorithms

Algorithm development

Data modeling

Semantics

Classification systems

Statistical methods

Back to Top