Exploiting Wikipedia to Measure the Semantic Relatedness between Arabic Terms

Basel AlHaj, Iyad AlAgha

Abstract


Measuring the semantic relatedness between words or terms plays an important role in many domains such as linguistics and artificial intelligence. Although this topic has been widely explored in the literature, most efforts focused on the English text, while little has been done to measure the similarity between Arabic terms. A growing number of semantic relatedness measures have relied on an underlying background knowledge such as Wikipedia. They often map terms to Wikipedia concepts, and then use the content or hyperlink structure of the corresponding Wikipedia articles to estimate the similarity between terms. However, existing approaches mostly focused on the English version of Wikipedia, while limited work has been done on the Arabic version. This work proposes an approach that takes advantage of Wikipedia features to measure the relationship between Arabic terms. It exploits two types of relations to gain rich features for the similarity measure, which are: the context-based relation and the category-based relation. The context-based relation is measured based on the intersection between incoming links of Wikipedia articles, while the category-based relation is measured by utilizing the taxonomy of Wikipedia categories. The proposed approach was evaluated based on a translated version of the WordSimilarity-353 benchmark dataset. The results show that our approach generally outperforms several approaches in the literature that use the same dataset in English. However, the poor structure and content of the Arabic version of Wikipedia compared to the English version has resulted in several incorrect similarity scores.

Keywords


Semantic relatedness, Arabic text, Wikipedia, Text Similarity

Full Text:

PDF


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.